如何从 xml 文件创建 R 数据框 [英] how to create an R data frame from a xml file
问题描述
我有一个 XML 文档文件.文件部分如下所示:
I have a XML Document file. The part of the file looks like this:
-<attr>
<attrlabl>COUNTY</attrlabl>
<attrdef>County abbreviation</attrdef>
<attrtype>Text</attrtype>
<attwidth>1</attwidth>
<atnumdec>0</atnumdec>
-<attrdomv>
-<edom>
<edomv>C</edomv>
<edomvd>Clackamas County</edomvd>
<edomvds/>
</edom>
-<edom>
<edomv>M</edomv>
<edomvd>Multnomah County</edomvd>
<edomvds/>
</edom>
-<edom>
<edomv>W</edomv>
<edomvd>Washington County</edomvd>
<edomvds/>
</edom>
</attrdomv>
</attr>
从这个 XML 文件中,我想创建一个包含 attrlabl、attrdef、attrtype 和 attrdomv 列的 R 数据框.请注意 attrdomv 列应包括类别变量的所有级别.数据框应如下所示:
From this XML file, I want to create an R data frame with the columns of attrlabl, attrdef, attrtype, and attrdomv. Please note that the attrdomv column should include all of the levels for the category variable. The data frame should look like this:
attrlabl attrdef attrtype attrdomv
COUNTY County abbreviation Text C Clackamas County; M Multnomah County; W Washington County
我有这样一个不完整的代码:
I have an incomplete code like this:
doc <- xmlParse("taxlots.shp.xml")
dataDictionary <- xmlToDataFrame(getNodeSet(doc,"//attrlabl"))
你能完成我的 R 代码吗?感谢您的帮助!
Could you please complete my R code? I appreciate any help!
推荐答案
假设这是正确的 taxlots.shp.xml
文件:
Assuming this is the correct taxlots.shp.xml
file:
<attr>
<attrlabl>COUNTY</attrlabl>
<attrdef>County abbreviation</attrdef>
<attrtype>Text</attrtype>
<attwidth>1</attwidth>
<atnumdec>0</atnumdec>
<attrdomv>
<edom>
<edomv>C</edomv>
<edomvd>Clackamas County</edomvd>
<edomvds/>
</edom>
<edom>
<edomv>M</edomv>
<edomvd>Multnomah County</edomvd>
<edomvds/>
</edom>
<edom>
<edomv>W</edomv>
<edomvd>Washington County</edomvd>
<edomvds/>
</edom>
</attrdomv>
</attr>
你就快到了:
doc <- xmlParse("taxlots.shp.xml")
xmlToDataFrame(nodes=getNodeSet(doc1,"//attr"))[c("attrlabl","attrdef","attrtype","attrdomv")]
attrlabl attrdef attrtype attrdomv
1 COUNTY County abbreviation Text CClackamas CountyMMultnomah CountyWWashington County
但最后一个字段不是您想要的格式.为此,需要一些额外的步骤:
But the last field has not the format you wanted. To do so, require some additional steps:
step1 <- xmlToDataFrame(nodes=getNodeSet(doc1,"//attrdomv/edom"))
step1
edomv edomvd edomvds
1 C Clackamas County
2 M Multnomah County
3 W Washington County
step2 <- paste(paste(step1$edomv, step1$edomvd, sep=" "), collapse="; ")
step2
[1] "C Clackamas County; M Multnomah County; W Washington County"
cbind(xmlToDataFrame(nodes= getNodeSet(doc1, "//attr"))[c("attrlabl", "attrdef", "attrtype")],
attrdomv= step2)
attrlabl attrdef attrtype attrdomv
1 COUNTY County abbreviation Text C Clackamas County; M Multnomah County; W Washington County
这篇关于如何从 xml 文件创建 R 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!