从XML文件创建数据框列表的更直接方法? [英] More direct way to create list of dataframes from XML file?

查看:49
本文介绍了从XML文件创建数据框列表的更直接方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

SDMX (统计数据和元数据交换)是一种"XML"语法,它定义了交换统计数据的标准.它使用称为数据集结构定义描述(DSD)的文件来传达数据集的结构.DSD除其他外还包含一个节点 Codelists ,该节点由 Codelist 项组成,而这些项又是 Code Name的父项项目和属性.我目前正在尝试解析

SDMX (Statistical Data and Metadata Exchange) is a 'XML' grammar that defines a standard for exchanging statistical data. It uses files called Dataset Structure Definition Description (DSD) to convey the structure of a dataset. Amongst other things the DSD contains a node Codelists that is comprised of the Codelist items which in turn are parent to the Code and Name item and attribuet. I am currently trying to parse these Codelists of a DSD file requested from Eurostats REST interface into a list of dataframes in R using the following code:

library(XML);library(RCurl)

# REST resource for DSD of nama_gdp_c
# downloading, parsing XML an setting root
file <- "http://ec.europa.eu/eurostat/SDMX/diss-web/rest/datastructure/ESTAT/DSD_nama_gdp_c"
content <- getURL(file, httpheader = list('User-Agent' = 'R-Agent'))
root <- xmlRoot(xmlInternalTreeParse(content, useInternalNodes = TRUE))

# get Nodeset of Codelists and its length
nodes <- getNodeSet(root,"//str:Codelist")
nn <- length(nodes)

# Create nested List of all Codes and Names
codelistAll <- lapply(seq(nn),function(i){
  xpathSApply(root,paste0("//str:Codelist[",i,"]/str:Code"),xmlGetAttr, "id")
})

namelistAll <- lapply(seq(nn),function(i){
  xpathSApply(root,paste0("//str:Codelist[",i,"]/str:Code/com:Name"),xmlValue)
})

# Create a list of dataframes from the nested lists
alldfList <-lapply(seq(nn),function(i) data.frame(codes=codelistAll[[i]],names=namelistAll[[i]]))

# Name the list items like the nodes
names(alldfList)  <- sapply(nodes, xmlGetAttr,"id")

这将产生 alldfList ,这是我一直在寻找的数据帧列表.

This yields alldfList, the list of dataframes which I was looking for.

> str(alldfList)
List of 6
 $ CL_FREQ      :'data.frame':  6 obs. of  2 variables:
  ..$ codes: Factor w/ 6 levels "A","D","H","M",..: 2 6 5 1 4 3
  ..$ names: Factor w/ 6 levels "Annual","Daily",..: 2 6 4 1 3 5
 $ CL_GEO       :'data.frame':  49 obs. of  2 variables:
  ..$ codes: Factor w/ 49 levels "AT","BA","BE",..: 22 21 20 10 16 15 14 13 12 11 ...
  ..$ names: Factor w/ 49 levels "Austria","Belgium",..: 19 18 17 16 15 14 13 12 11 10 ...

尽管这样做可以完成工作,但我觉得必须有一种更简单的语法来实现此目的.特别是 paste0 的使用和名称的最终分配似乎很尴尬.我一直在阅读 XML 包的文档,我怀疑它必须是对 xlmChildren 的某些操作,但我无法全神贯注于实际操作方法.有没有人建议进行此操作的规范方法?任何建议将不胜感激.

While this does the job, I have the feeling that there must be a more straightforward syntax to achieve this. Especially the use of paste0 and the final assignment of names seem awkward. I have been reading through the documentation of the XML package and I suspect it must be some operation on the xlmChildren but I cannot wrap my head around how to actually do it. Does anyone have a suggestion for a canonical way of doing this operation? Any suggestion would be greatly appreciated.

推荐答案

您可以直接从节点获取data.frames,但需要使用命名空间

You can get the data.frames directly from nodes, but need to use a namespace

ns <- c(str="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/structure")

alldfList <- lapply(nodes, function(x){ data.frame(
  codes= xpathSApply(x, ".//str:Code" , xmlGetAttr, "id", namespaces=ns),
  names= xpathSApply(x, ".//str:Code" , xmlValue, namespaces=ns) )})

names(alldfList)  <- sapply(nodes, xmlGetAttr,"id")

这篇关于从XML文件创建数据框列表的更直接方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆