从XML文件创建数据框列表的更直接方法? [英] More direct way to create list of dataframes from XML file?

查看：49 发布时间：2021/5/28 20:21:43 xml r xpath lapply

本文介绍了从XML文件创建数据框列表的更直接方法?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

SDMX (统计数据和元数据交换)是一种"XML"语法，它定义了交换统计数据的标准.它使用称为数据集结构定义描述(DSD)的文件来传达数据集的结构.DSD除其他外还包含一个节点 Codelists ，该节点由 Codelist 项组成，而这些项又是 Code 和 Name的父项项目和属性.我目前正在尝试解析

SDMX (Statistical Data and Metadata Exchange) is a 'XML' grammar that defines a standard for exchanging statistical data. It uses files called Dataset Structure Definition Description (DSD) to convey the structure of a dataset. Amongst other things the DSD contains a node Codelists that is comprised of the Codelist items which in turn are parent to the Code and Name item and attribuet. I am currently trying to parse these Codelists of a DSD file requested from Eurostats REST interface into a list of dataframes in R using the following code:

library(XML);library(RCurl)

# REST resource for DSD of nama_gdp_c
# downloading, parsing XML an setting root
file <- "http://ec.europa.eu/eurostat/SDMX/diss-web/rest/datastructure/ESTAT/DSD_nama_gdp_c"
content <- getURL(file, httpheader = list('User-Agent' = 'R-Agent'))
root <- xmlRoot(xmlInternalTreeParse(content, useInternalNodes = TRUE))

# get Nodeset of Codelists and its length
nodes <- getNodeSet(root,"//str:Codelist")
nn <- length(nodes)

# Create nested List of all Codes and Names
codelistAll <- lapply(seq(nn),function(i){
  xpathSApply(root,paste0("//str:Codelist[",i,"]/str:Code"),xmlGetAttr, "id")
})

namelistAll <- lapply(seq(nn),function(i){
  xpathSApply(root,paste0("//str:Codelist[",i,"]/str:Code/com:Name"),xmlValue)
})

# Create a list of dataframes from the nested lists
alldfList <-lapply(seq(nn),function(i) data.frame(codes=codelistAll[[i]],names=namelistAll[[i]]))

# Name the list items like the nodes
names(alldfList)  <- sapply(nodes, xmlGetAttr,"id")

这将产生 alldfList ，这是我一直在寻找的数据帧列表.

This yields alldfList, the list of dataframes which I was looking for.

> str(alldfList)
List of 6
 $ CL_FREQ      :'data.frame':  6 obs. of  2 variables:
  ..$ codes: Factor w/ 6 levels "A","D","H","M",..: 2 6 5 1 4 3
  ..$ names: Factor w/ 6 levels "Annual","Daily",..: 2 6 4 1 3 5
 $ CL_GEO       :'data.frame':  49 obs. of  2 variables:
  ..$ codes: Factor w/ 49 levels "AT","BA","BE",..: 22 21 20 10 16 15 14 13 12 11 ...
  ..$ names: Factor w/ 49 levels "Austria","Belgium",..: 19 18 17 16 15 14 13 12 11 10 ...

尽管这样做可以完成工作，但我觉得必须有一种更简单的语法来实现此目的.特别是 paste0 的使用和名称的最终分配似乎很尴尬.我一直在阅读 XML 包的文档，我怀疑它必须是对 xlmChildren 的某些操作，但我无法全神贯注于实际操作方法.有没有人建议进行此操作的规范方法?任何建议将不胜感激.

While this does the job, I have the feeling that there must be a more straightforward syntax to achieve this. Especially the use of paste0 and the final assignment of names seem awkward. I have been reading through the documentation of the XML package and I suspect it must be some operation on the xlmChildren but I cannot wrap my head around how to actually do it. Does anyone have a suggestion for a canonical way of doing this operation? Any suggestion would be greatly appreciated.

从XML文件创建数据框列表的更直接方法? [英] More direct way to create list of dataframes from XML file?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从XML文件创建数据框列表的更直接方法? [英] More direct way to create list of dataframes from XML file?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭