值为多或缺失时，来自XML的R数据帧 [英] R dataframe from XML when values are multiple or missing

查看：146 发布时间：2017/3/26 0:11:03 xml r xpath import dataframe

本文介绍了值为多或缺失时，来自XML的R数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

此问题与以前的问题类似，全部导入XML（作为数据框）的字段（和子字段），但是我想仅提取XML数据的一部分，并希望包含缺少/多个值。

我从一个XML文件开始，并希望根据XML元素的内容定义的一些数据，在R中构建一个数据帧。以一个例子来解释是最简单的。在下面，我想选出每个城市的地标信息（即使没有地标元素也有几个），忽略有关电台的信息。

 < world> 
< city> 
< name>伦敦< / name> 
< buildings> 
< building> 
< type> landmark< / type> 
< bname>塔桥< / bname> 
< / building> 
< building> 
< type> station< / type> 
< bname>滑铁卢< / bname> 
< / building> 
< / buildings> 
< / city> 
< city> 
< name>纽约< / name> 
< buildings> 
< building> 
< type> station< / type> 
< bname> Grand Central< / bname> 
< / building> 
< / buildings> 
< / city> 
< city> 
< name> Paris< / name> 
< buildings> 
< building> 
< type> landmark< / type> 
< bname>艾菲尔铁塔< / bname> 
< / building> 
< building> 
< type> landmark< / type> 
< bname> Louvre< / bname> 
< / building> 
< / buildings> 
< / city> 
< / world>

理想情况下，这将进入一个如下所示的数据框：

 伦敦塔桥
纽约NA 
巴黎埃菲尔铁塔
巴黎卢浮宫
 我假设可能有办法使用XML库和 xpathSApply ，但我认为我被殴打。
 
 
 也不会想到如何短语的问题，而不仅仅是提到这个例子，所以随便编辑一个更具描述性的问题。 p> 
 
解决方案
假设XML数据位于名为的文件中。进入并迭代城市提取城市名称和任何相关地标的 bname ：
 库（XML）
 doc<  -  xmlParse（world.xml，useInternalNodes = TRUE）
 
 do.call（rbind，xpathApply（doc，/ world / city，function（node）{
 
 city< ;  -  xmlValue（node [[name]]）
 
 xp<  - ./buildings/building[./type/text()='landmark']/bname
地标<  -  xpathSApply（node，xp，xmlValue）
 if（is.null（landmark））landmark<  -  NA 
 
 data.frame（city，landmark，stringsAsFactors = FALSE ）
 
}））
  
结果是：
 城市地标
 1伦敦塔桥
 2纽约< NA> 
 3巴黎埃菲尔铁塔
 4巴黎卢浮宫
  
 
This question is similar to a previous question, Import all fields (and subfields) of XML as dataframe, but I want to pull out only a subset of the XML data and want to include missing/multiple values.


I start with an XML file and want to construct a dataframe in R based on some of the data it contains, defined by the contents of XML elements. It is easiest to explain with an example. In the below, I want to pick out the information about landmarks for every city (even if there is no landmark element or there are several) and ignore the information about stations.
<world>
    <city>
        <name>London</name>
        <buildings>
            <building>
                <type>landmark</type>
                <bname>Tower Bridge</bname>
            </building>
            <building>
                <type>station</type>
                <bname>Waterloo</bname>
            </building>
        </buildings>
    </city>
    <city>
        <name>New York</name>
        <buildings>
            <building>
                <type>station</type>
                <bname>Grand Central</bname>
            </building>
        </buildings>
    </city>
    <city>
        <name>Paris</name>
        <buildings>
            <building>
                <type>landmark</type>
                <bname>Eiffel Tower</bname>
            </building>
            <building>
                <type>landmark</type>
                <bname>Louvre</bname>
            </building>
        </buildings>
    </city>
</world>
Ideally this would go into a dataframe that looks something like this:
 London      Tower Bridge
 New York    NA
 Paris       Eiffel Tower
 Paris       Louvre
I assumed there might be a way to do this using the XML library and xpathSApply but I think I'm beaten.

Also couldn't think how to phrase the question without just referring to the example so feel free to edit to give a more descriptive question.
 解决方案 
Assuming the XML data is in a file called world.xml read it in and iterate over the cities extracting the city name and the bname of any associated landmarks :
library(XML)
doc <- xmlParse("world.xml", useInternalNodes = TRUE)

do.call(rbind, xpathApply(doc, "/world/city", function(node) {

   city <- xmlValue(node[["name"]])

   xp <- "./buildings/building[./type/text()='landmark']/bname"
   landmark <- xpathSApply(node, xp, xmlValue)
   if (is.null(landmark)) landmark <- NA

   data.frame(city, landmark, stringsAsFactors = FALSE)

}))
The result is:
      city     landmark
1   London Tower Bridge
2 New York         <NA>
3    Paris Eiffel Tower
4    Paris       Louvre


                        
这篇关于值为多或缺失时，来自XML的R数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

值为多或缺失时，来自XML的R数据帧 [英] R dataframe from XML when values are multiple or missing

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

值为多或缺失时，来自XML的R数据帧 [英] R dataframe from XML when values are multiple or missing

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭