如何提取多个XML文件的文件属性并将其与XML提取的数据组合(使用R) [英] How to extract file properties of multiple XML files and combine them with the XML extracted data (Using R)

查看:87
本文介绍了如何提取多个XML文件的文件属性并将其与XML提取的数据组合(使用R)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对R还是很陌生,需要一些帮助(将文件名和属性与从多个xml文件(大约200个)中提取的数据进行合并(提取和合并),然后应将其转换为数据框.

I am fairly new to R and need some help to (extract and) combine file names and properties with data extracted from multiple xml files (about 200) which will should then be converted into a dataframe.

我正在使用以下脚本来选择xml文件,提取数据并将其转换为数据框(并且可以正常运行)

I am using the following script to select the xml files, extract the data and convert it into a dataframe (and is working without errors):

library(XML)
library(plyr)

# Select multiple xml files within directory
FileName <- list.files(pattern = "xml$",
                       ignore.case=TRUE,
                       full.names = FALSE)

# Create function to extract data
RI_ID <-function(FileName) {
  doc1 <- xmlParse(FileName) 
  doc <- xmlToDataFrame(doc1["//ObjectList[@ObjectType='pkg']/o"], )
} 

# Convert to dataframe
T1 <- ldply(FileName,RI_ID)

# Rename columns
names(T1)[names(T1) == "a"] <- "UniqueInstallationPackageID"
names(T1)[names(T1) == "b"] <- "PackageVersion_Latest"

# Convert to numeric
FieldToNumeric <- c("UniqueInstallationPackageID", "PackageVersion_Latest")
T1[,FieldToNumeric] <- lapply(T1[,FieldToNumeric], as.numeric)

我想(并需要一些帮助):

I would like to (and need some help) to:

  • 提取在Windows资源管理器中显示的xml文件的修改日期;
  • 在最终数据框中包含文件名和修改日期.

我已经审查了以下两个资源,但是在实现它们方面没有任何成功:

I have reviewed the following two sources, but did not have any success in implementig them:

由于保密协议,我无法共享xml文件的示例,但是,如果需要,可以重命名节点等并提交.谢谢您的帮助.

Due to a confidentiality agreement, I could not share an example of the xml file, but, if need be, can rename the nodes etc. and submit it. Thank you for your help.

推荐答案

只需调整 RI_ID 方法即可检索这两条信息(使用

Simply adjust RI_ID method to retrieve those two pieces of information (modified date/time with file.info and FileName variable) and bind those values into new columns of xml data frame. Below transform() allows adding columns to a data frame with comma separated assignments:

# Create function to extract data
RI_ID <-function(FileName) {
  doc <- xmlParse(FileName) 
  df <- transform(xmlToDataFrame(doc["//ObjectList[@ObjectType='pkg']/o"]),
                  file_name = FileName,
                  file_modified = file.info(FileName)$mtime)
} 

这篇关于如何提取多个XML文件的文件属性并将其与XML提取的数据组合(使用R)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆