R:将节点插入到特定位置的xml树中 [英] R: Insert node into xml tree at specific location

查看:237
本文介绍了R:将节点插入到特定位置的xml树中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有这样结构的xml文件(大示例显示了所需的灵活性):

I have an xml file with a structure like this (large example to show the needed flexibility):

<rootnode sth="something" descr="ex">
  <tag sth="sth1" descr="ex" anoAttr="sth2">
    <tag sth="sth3" descr="ex2" searchA="sth4" anoAttr="sth5">
      <tag sth="sth6" descr="ex3" oAttr="sth7" searchA="sth8" anoAttr="sth9">
        <tag sth="sth10" descr="ex4" oAttr="sth11" searchA="sth12" anoAttr="sth13">
          <someContent/>
        </tag>
        <someContent/>
      </tag>
      <tag sth="sth14" descr="ex5" oAttr="sth15" searchA="sth16" anoAttr="sth17">
        <someContent/>
      </tag>
      <tag sth="sth1" descr="ex6" oAttr="sth15" searchA="sth18" anoAttr="sth17">
        <someContent/>
      </tag>
    </tag>
    <tag sth="sth10" descr="ex2" oAttr="sth19" searchA="sth20" anoAttr="sth9">
      <someContent/>
    </tag>
    <tag sth="sth10" descr="ex7" searchA="sth21" anoAttr="sth13">
      <tag sth="sth21" descr="ex8" oAttr="sth22" searchA="sth23" anoAttr="sth9">
        <tag sth="sth23" descr="ex9" oAttr="sth22" searchA="sth24" anoAttr="sth5">
          <someContent/>
        </tag>
        <someContent/>
      </tag>
    </tag>
  </tag>
  <otherNode>
    <someNode/>
  </otherNode>
</rootnode>

具体来说,任何tag节点的大小都是未知的,所有tag节点的属性数量不相等,并且属性的值也不唯一.
但是,我确实知道searchA属性的值是唯一的.另外,只有tag个节点可以包含一个名为searchA的属性,除顶级节点外,所有其他属性都可以.

Specifically, the size of any of the tag nodes is unknown, the number of attributes is not equal for all tag nodes and the values of the attributes are not unique.
What I do know, however, is that the value of the searchA attribute is unique. Also, only tag nodes can contain an attribute called searchA and all of them except the top level one do.

我首先使用具有功能xmlTreeParse()XML包解析此文档,并存储根节点.然后,我使用newXMLNode()创建一个新节点.

I first parse this document using the XML package with the function xmlTreeParse() and store the root node. I then create a new node using newXMLNode().

xmlfile = xmlTreeParse(filename, useInternalNodes = TRUE)
xmltop = xmlRoot(xmlfile)
newNode = newXMLNode(name = "newlyCreatedNode")

目标

我的目标是将新创建的newNode插入为具有某个值(例如"sth23")作为searchA属性的节点的子级.
因此,在这种情况下,我希望结果看起来像这样(注意底部附近的<newlyCreatedNode/>):

Goal

My goal is to insert my newly created newNode as a child of the node that has a certain value (for example "sth23") as the searchA attribute.
So in this case I want the result to look like this (notice the <newlyCreatedNode/> near the bottom):

<rootnode sth="something" descr="ex">
  <tag sth="sth1" descr="ex" anoAttr="sth2">
    <tag sth="sth3" descr="ex2" searchA="sth4" anoAttr="sth5">
      <tag sth="sth6" descr="ex3" oAttr="sth7" searchA="sth8" anoAttr="sth9">
        <tag sth="sth10" descr="ex4" oAttr="sth11" searchA="sth12" anoAttr="sth13">
          <someContent/>
        </tag>
        <someContent/>
      </tag>
      <tag sth="sth14" descr="ex5" oAttr="sth15" searchA="sth16" anoAttr="sth17">
        <someContent/>
      </tag>
      <tag sth="sth1" descr="ex6" oAttr="sth15" searchA="sth18" anoAttr="sth17">
        <someContent/>
      </tag>
    </tag>
    <tag sth="sth10" descr="ex2" oAttr="sth19" searchA="sth20" anoAttr="sth9">
      <someContent/>
    </tag>
    <tag sth="sth10" descr="ex7" searchA="sth21" anoAttr="sth13">
      <tag sth="sth21" descr="ex8" oAttr="sth22" searchA="sth23" anoAttr="sth9">
        <tag sth="sth23" descr="ex9" oAttr="sth22" searchA="sth24" anoAttr="sth5">
          <someContent/>
        </tag>
        <someContent/>
        <newlyCreatedNode/>
      </tag>
    </tag>
  </tag>
  <otherNode>
    <someNode/>
  </otherNode>
</rootnode>

基本上,在这种情况下,addChildren(xmltop[[1]][[3]][[1]], kids = list(newNode))为我提供了我想要的结果.当然,我不想指定[[1]][[3]][[1]].

Basically, in this case addChildren(xmltop[[1]][[3]][[1]], kids = list(newNode)) gets me the result that I want. Of course I do not want to specify [[1]][[3]][[1]].

我可以使用xmlElementsByTagName()获取所有相关节点的列表,并使用xmlAttrs()获取所有属性.我什至可以得到一个逻辑索引向量,该向量可以为我提供正确的位置.

I can get a list of all relevant nodes with xmlElementsByTagName() and get all attributes with xmlAttrs(). I can even get a logical index vector which gives me the correct location.

listOfNodes = xmlElementsByTagName(el = xmltop, "tag", recursive = T)
attributeList = lapply(listOfNodes, FUN = function(x) xmlAttrs(x))
indexVector = sapply(attributeList, FUN = function(x) x["searchA"] == "sth23")
indexVector[is.na(indexVector)] = FALSE
listOfNodes[indexVector]

我不知道如何使用此信息将节点插入树的正确位置.
listOfNodes[indexVector]为我提供了正确的节点,但它现在是列表,而不是我可以在其上使用addChildren()的节点.
即使我设法以某种方式将所有节点的indexVectorxmlSize()映射到我可以直接在xmltop上使用的正确索引,我仍然会遇到双括号(xmltop[[1]][[3]]xmltop[[1]][[2]][[1]]).

What I do not know is how to use this information to insert my node into the tree at the correct location.
listOfNodes[indexVector] gives me the correct node, but it is now a list and not a node I can use addChildren() on.
Even if I somehow managed to map the indexVector and the xmlSize() of all nodes to the correct indices that I could use on xmltop directly, I would still have the problem of a variable number of double brackets (xmltop[[1]][[3]] vs xmltop[[1]][[2]][[1]]).

我还尝试了XML软件包的其他几个功能,包括xmlApplygetNodeLocationgetNodeSet,但是它们似乎没有帮助.

I have also tried several other functions of the XML package, including xmlApply, getNodeLocation and getNodeSet, but they did not seem to help.

我不太了解xmlTreeParse()xmlInternalTreeParse()xmlTreeParse(useInternalNodes = T)的区别,我无法将自己的头缠在XPath上,因此尝试使用它并不遥不可及.

I do not really understand the difference of xmlTreeParse(), xmlInternalTreeParse() and xmlTreeParse(useInternalNodes = T) and I cannot wrap my head around XPath, so I did not get very far trying to use it.

任何有用的指针将不胜感激.

Any helpful pointers would be much appreciated.

推荐答案

我感到困惑的原因是?xmlElementsByTagName的帮助页面.它说:

The reason for my confusion was the help page for ?xmlElementsByTagName. It says there:

添加 recursive 参数使此函数的行为类似于Java,C \#等其他语言API中的 getElementsByTagName .了解到,使用这些语言,人们将获得一组节点对象.这些节点引用了其父级和子级.因此,一个人可以从每个节点导航树,找到其关系,等等.在此程序包的当前版本中(并且在可预见的将来),节点集是原始树中节点的副本".这些节点都没有找到其兄弟姐妹或父节点的便利."

"The addition of the recursive argument makes this function behave like the getElementsByTagName in other language APIs such as Java, C\#. However, one should be careful to understand that in those languages, one would get back a set of node objects. These nodes have references to their parents and children. Therefore one can navigate the tree from each node, find its relations, etc. In the current version of this package (and for the forseeable future), the node set is a "copy" of the nodes in the original tree. And these have no facilities for finding their siblings or parent."

这使我认为该函数返回一个副本列表,而不是对节点本身的引用.
如果xml是在xmlTreeParse()函数的标志useInternalNodes设置为FALSE的情况下解析的,则可能是这种情况,但如果在解析时将其设置为TRUE,则xmlElementsByTagName()返回的列表似乎可以包含实际参考.
可以使用例如addChildren()轻松地对其进行操作.

This made me think that the function returns a list of copies instead of references to the nodes themselves.
This might possibly be the case if the xml was parsed with the flag useInternalNodes of the xmlTreeParse() function set to FALSE, but if it is set to TRUE when parsing, the list returned by xmlElementsByTagName() seems to contain the actual references.
These can easily be manipulated using for example addChildren().

简而言之,针对我的问题的非常简单的解决方案是:

In short, the very simple solution to my problem is:

addChildren(listOfNodes[indexVector], kids = list(newNode))

这篇关于R:将节点插入到特定位置的xml树中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆