处理R中的空XML节点 [英] Dealing with empty XML nodes in R

查看:114
本文介绍了处理R中的空XML节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下XML文件(我缺少根节点,但是编辑器不允许我-请假设这里有一个根节点):

I have the following XML file (I am missing the root node but the editor is not allowing me--please assume there is a root node here):

<Indvls>
    <Indvl>
        <Info lastNm="HANSON" firstNm="LAURIE"/>
        <CrntEmps>
            <CrntEmp orgNm="ABC INCORPORATED" str1="FOURTY FOUR BRYANT PARK" city="NEW YORK" state="NY" cntry="UNITED STATES" postlCd="10036">
            <BrnchOfLocs>
                <BrnchOfLoc str1="833 NE 55TH ST" city="BELLEVUE" state="WA" cntry="UNITED STATES" postlCd="98004"/>
            </BrnchOfLocs>
            </CrntEmp>
        </CrntEmps>
    </Indvl>
    <Indvl>
        <Info lastNm="JACKSON" firstNm="SHERRY"/>
        <CrntEmps>
            <CrntEmp orgNm="XYZ INCORPORATED" str1="3411 GEORGE STREET" city="SAN FRANCISCO" state="CA" cntry="UNITED STATES" postlCd="94105">
            <BrnchOfLocs>
            </BrnchOfLocs>
            </CrntEmp>
        </CrntEmps>
    </Indvl>
</Indvls>

我想使用R提取表形式的以下列: (a)来自/Info节点的lastNm和firstNm-始终带有值; (b)来自/CrntEmps/CrntEmp节点的orgNm-始终带有值;和 (c)/CrntEmps/BrnchOfLocs/BrnchofLoc节点中的str1,城市,州-可能带有或不带有值(在我的示例中,第二个实体没有办公室位置地址).

Using R, I want to extract the following columns in the form of a table: (a) lastNm and firstNm from /Info node--always present with values; (b) orgNm from /CrntEmps/CrntEmp node--always present with values; and (c) str1, city, state from /CrntEmps/BrnchOfLocs/BrnchofLoc node--may or may not come with values (in my example the second entity does NOT have an office location address).

我的挑战是许多节点将没有BrnchOfLoc节点.即使节点丢失,我也想创建一个条目(否则表不平衡,并且在数据框中创建它时出现错误).

My challenge is that many nodes will not have the BrnchOfLoc node. I want to create an entry even if the nodes are missing (otherwise the table is unbalanced and I get an error while creating it in a data frame).

有什么想法或建议吗?感谢您的投入.

Any thoughts or suggestions? I appreciate any inputs.

附录:这是我的代码:

xmlGetNodeAttr <- function(n, xp, attr, default=NA) {
ns<-getNodeSet(n, xp)
if(length(ns)<1) {
    return(default)
} else {
    sapply(ns, xmlGetAttr, attr, default)
}
}

do.call(rbind, lapply(xmlChildren(xmlRoot(doc)), function(x) {
data.frame(
    fname=xmlGetNodeAttr(x, "//Info","firstNm",NA),
    lname=xmlGetNodeAttr(x, "//Info","lastNm",NA),
  orgname=xmlGetNodeAttr(x,"//CrntEmps/CrntEmp[1]","orgNm",NA),
    zip=xmlGetNodeAttr(x, "//CrntEmps/CrntEmp[1]/BrnchOfLocs/BrnchOfLoc[1]","city",NA)
)
}))

推荐答案

您应该做的

do.call(rbind, lapply(xmlChildren(xmlRoot(doc)), function(x) {
data.frame(
    fname=xmlGetNodeAttr(x, "./Info","firstNm",NA),
    lname=xmlGetNodeAttr(x, "./Info","lastNm",NA),
    orgname=xmlGetNodeAttr(x, "./CrntEmps/CrntEmp[1]","orgNm",NA),
    zip=xmlGetNodeAttr(x, "./CrntEmps/CrntEmp[1]/BrnchOfLocs/BrnchOfLoc[1]","city",NA)
)
}))

请注意使用./而不是//.后者将搜索整个文档,而忽略您正在lapply所在的当前节点.使用./将从当前x节点开始,并且仅查看后代.返回

Note the use of ./ rather than //. The latter will search across the entire document, ignoring the current node that you are lapply-ing over. Using ./ will start with the current x node and only look at descendants. This returns

        fname   lname          orgname      zip
Indvl  LAURIE  HANSON ABC INCORPORATED BELLEVUE
Indvl1 SHERRY JACKSON XYZ INCORPORATED     <NA>

这篇关于处理R中的空XML节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆