R中带有分层节点、标签和值的xml解析器 [英] xml parser in R with hierarchical nodes, tags and values

查看:16
本文介绍了R中带有分层节点、标签和值的xml解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从以下 xml 文件中解析 sample_attributes(最好是全部).尝试了几件事,但 XML 被聚集到一个节点中:

I am trying to parse sample_attributes (preferably all) from the following xml file. Tried a couple of things but the XML gets clumped into one node:

xml.url <- "http://www.ebi.ac.uk/ena/data/view/ERS445758&display=xml"
xmlfile <- xmlTreeParse(xml.url)
xmltop = xmlRoot(xmlfile)
IBDcat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

还尝试了这里提到的解决方案:如何将 XML 解析为 R 数据框如何从一个xml 文件 但是当我尝试类似:

Also tried solutions mentioned here: How to parse XML to R data frame and how to create an R data frame from a xml file but when I try something like:

data <- xmlParse("http://www.ebi.ac.uk/ena/data/view/ERS445758&display=xml")
xml_data <- xmlToList(data)
xmlToDataFrame(nodes=getNodeSet(data,"/SAMPLE_ATTRIBUTE"))[c("age","sex","body site","body-mass index")]

我收到一个错误提示选择了未定义的列

I get an error saying undefined columns selected

任何帮助将不胜感激!

推荐答案

至少在您第二次尝试时,您只需要使用//选择 any SAMPLE_ATTRIBUTE 节点.然后按标签子集.

At least for your second attempt, you just needed to select any SAMPLE_ATTRIBUTE node using //. Then subset by tag.

doc <- xmlParse(xml.url)
x <- xmlToDataFrame(getNodeSet(doc,"//SAMPLE_ATTRIBUTE"))
## OR 
xmlToDataFrame(doc["//SAMPLE_ATTRIBUTE"])
                  TAG      VALUE UNITS
1  investigation type metagenome  <NA>
2        project name       BMRP  <NA>
3 experimental factor microbiome  <NA>
4         target gene   16S rRNA  <NA>
5  target subfragment       V1V2  <NA>
...


subset(x, TAG %in% c("age","sex","body site","body-mass index") )
               TAG         VALUE UNITS
15             age            28 years
16             sex          male  <NA>
17       body site Sigmoid colon  <NA>
19 body-mass index    16.9550173  <NA>

这篇关于R中带有分层节点、标签和值的xml解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆