如何遍历R中的xml节点 [英] How to loop through xml nodes in R

查看:24
本文介绍了如何遍历R中的xml节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需求,将一个xml文档拆分成多个节点;然后将每个节点分别拆分为更多的子节点.我在 XML 包中使用 xpathSApply/getNodeSet 函数.但似乎一旦 xml 文档被拆分为节点,每个节点现在都被视为类内部节点",因此除非我们使用 saveXML() 将其保存为 xml,否则无法对其执行 spath 操作.关于如何在无需执行 SAVEXML 的情况下解决此问题的任何想法?例如,考虑下面的示例 xml:

I have a requirement to split an xml document into multiple nodes; and then split each node separately into more sub nodes. I am using xpathSApply/getNodeSet functions in XML package. But it seems like once the xml document is split as nodes, each node is now considered as class "internal node" and hence cannot perform spath operations on it unless we save it as an xml using saveXML(). Any ideas on how this can be worked out without having to do a SAVEXML? For example, consider sample xml below:

<array>
<ResidentialProperty>
    <Listing>
      <StreetAddress>
        <StreetNumber>11111</StreetNumber>
        <StreetName>111th</StreetName>
        <StreetSuffix>Avenue Ct</StreetSuffix>
        <StateOrProvince>WA</StateOrProvince>
      </StreetAddress>
      <MLSInformation>
        <ListingStatus Status="Active"/>
        <StatusChangeDate>2015-07-05T23:48:53.410</StatusChangeDate>
      </MLSInformation>
      <GeographicData>
        <Latitude>11.111111</Latitude>
        <Longitude>-111.111111</Longitude>
        <County>Pierce</County>
      </GeographicData>
</ResidentialProperty>
<ResidentialProperty>
    <Listing>
      <StreetAddress>
        <StreetNumber>11211</StreetNumber>
        <StreetName>11111334th</StreetName>
        <StreetSuffix>Av1enue Ct</StreetSuffix>
        <StateOrProvince>WA</StateOrProvince>
      </StreetAddress>
      <MLSInformation>
        <ListingStatus Status="Active"/>
        <StatusChangeDate>2017-07-05T23:48:53.410</StatusChangeDate>
      </MLSInformation>
      <GeographicData>
        <Latitude>11.111111</Latitude>
        <Longitude>-111.111111</Longitude>
        <County>Pie2rce</County>
      </GeographicData>
</ResidentialProperty>
</array>

我打算将上述内容拆分为:1. 两个独立的节点,具有 root ResidentialProperty2. 然后就可以在每个节点上执行XPATH操作了.

I am intending to split the above into: 1. Two separate nodes with root ResidentialProperty 2. Then be able to perform XPATH operations on each of these nodes.

P.S:这是示例数据,与我正在使用的实际数据集不相似.只是试图用它来解释我试图解决的问题.

P.S: This is sample data and not similar to the actual data set I am working with. Just tried to use this to explain the problem I am trying to solve.

推荐答案

EDIT :我想我误解了这个问题.新方法.

EDIT : I think I've misunderstood the question. New approach.

我们使用 xpathApplytoString.XMLNodexmlParseString 来提取 2 个对象中的特定节点.

We use xpathApply, toString.XMLNode and xmlParseString to extract specific nodes in 2 objects.

解析 XML 文件并提取节点:

Parse the XML file and exctract the nodes :

library(XML) :
doc=xmlParse("pathtoyourXML.xml")
result1=xmlParseString(toString.XMLNode(xpathApply(doc,"(//ResidentialProperty)[1]")))
result2=xmlParseString(toString.XMLNode(xpathApply(doc,"(//ResidentialProperty)[2]")))

我们有 2 个对象,我们用 :

We have 2 objects, we evaluate them with :

from.result1=xpathApply(result1,"//StreetAddress")
from.result2=xpathApply(result2,"//StreetAddress")

旁注:您的 XML 无效.列表元素未关闭.

Sidenote : your XML is not valid. Listings elements are not closed.

编辑 2:实际上,您可以在先前提取"的节点集上使用 XPathApply :

EDIT 2 : In fact, you can use XPathApply on a previously "extracted" nodeset :

foo=xpathApply(doc,"(//ResidentialProperty)[2]")
xpathApply(foo[[1]],"//StreetAddress")

foo 不包含前一个 xpath 表达式的结果((//ResidentialProperty)[2]),而是整个 XML 节点集.

foo does not contain the result of the previous xpath expression ((//ResidentialProperty)[2]) but the whole XML nodeset.

这篇关于如何遍历R中的xml节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆