R中带有分层节点、标签和值的xml解析器 [英] xml parser in R with hierarchical nodes, tags and values
本文介绍了R中带有分层节点、标签和值的xml解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试从以下 xml 文件中解析 sample_attributes(最好是全部).尝试了几件事,但 XML 被聚集到一个节点中:
I am trying to parse sample_attributes (preferably all) from the following xml file. Tried a couple of things but the XML gets clumped into one node:
xml.url <- "http://www.ebi.ac.uk/ena/data/view/ERS445758&display=xml"
xmlfile <- xmlTreeParse(xml.url)
xmltop = xmlRoot(xmlfile)
IBDcat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
还尝试了这里提到的解决方案:如何将 XML 解析为 R 数据框和如何从一个xml 文件 但是当我尝试类似:
Also tried solutions mentioned here: How to parse XML to R data frame and how to create an R data frame from a xml file but when I try something like:
data <- xmlParse("http://www.ebi.ac.uk/ena/data/view/ERS445758&display=xml")
xml_data <- xmlToList(data)
xmlToDataFrame(nodes=getNodeSet(data,"/SAMPLE_ATTRIBUTE"))[c("age","sex","body site","body-mass index")]
我收到一个错误提示选择了未定义的列
I get an error saying undefined columns selected
任何帮助将不胜感激!
推荐答案
至少在您第二次尝试时,您只需要使用//选择 any SAMPLE_ATTRIBUTE 节点.然后按标签子集.
At least for your second attempt, you just needed to select any SAMPLE_ATTRIBUTE node using //. Then subset by tag.
doc <- xmlParse(xml.url)
x <- xmlToDataFrame(getNodeSet(doc,"//SAMPLE_ATTRIBUTE"))
## OR
xmlToDataFrame(doc["//SAMPLE_ATTRIBUTE"])
TAG VALUE UNITS
1 investigation type metagenome <NA>
2 project name BMRP <NA>
3 experimental factor microbiome <NA>
4 target gene 16S rRNA <NA>
5 target subfragment V1V2 <NA>
...
subset(x, TAG %in% c("age","sex","body site","body-mass index") )
TAG VALUE UNITS
15 age 28 years
16 sex male <NA>
17 body site Sigmoid colon <NA>
19 body-mass index 16.9550173 <NA>
这篇关于R中带有分层节点、标签和值的xml解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文