从 Python 中的特定 XML 节点检索注释 [英] Retrieve comment from specific XML node in Python

查看:40
本文介绍了从 Python 中的特定 XML 节点检索注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下example.xml"文件

I have the following "example.xml" file

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <tag1>
  <tag2>tag2<!-- comment = "this is the tag1 comment"--></tag2>
    <tag3>
        <tag4>tag4<!-- comment = "this is the tag4 comment"--></tag4>
    </tag3>
  </tag1>
</root>

我想检索对特定节点的评论.目前,我只能使用以下内容从文件中检索所有评论

I'd like to retrieve the comment to a specific node. For now, I'm only able to retrieve all comments from the file, using the following

from lxml import etree

tree = etree.parse("example.xml")
comments = tree.xpath('//comment()')
print(comments)

正如预期的那样,这会以列表形式返回文件中的所有上述注释:

As expected, this returns all the above comments from the file in a list:

[<!-- comment = \u201cthis is the tag1 comment\u201d-->, <!-- comment = \u201cthis is the tag4 comment\u201d-->]

但是,我如何以及在哪里明确指定要检索其评论的节点?例如,如何指定 tag2 的某处只返回 <!-- comment = \u201cthis is the tag4 comment\u201d-->

However, how and where do I explicitly specify the node to which I want to retrieve its comment? For example, how can I specify somewhere tag2 to only return <!-- comment = \u201cthis is the tag4 comment\u201d-->

编辑

我有一个用例,我需要遍历 XML 文件的每个节点.如果迭代器遇到一个有多个子节点并带有注释的节点,它会返回其子节点的所有注释.例如,考虑以下example2.xml"文件:

I have a use case where I need to iterate over each node of the XML file. If the iterator comes to a node that has more than one child with a comment, it returns all the comments of its children. For example, consider the following "example2.xml" file:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <tag1>
    <tag2>
      <tag3>tag3<!-- comment = "this is the tag3 comment"--></tag3>
      <tag4>tag4<!-- comment = "this is the tag4 comment"--></tag4>
    </tag2>
  </tag1>
  <tag1>
    <tag2>
      <tag3>tag3<!-- comment = "this is the tag3 comment"--></tag3>
      <tag4>tag4<!-- comment = "this is the tag4 comment"--></tag4>
    </tag2>
  </tag1>
</root>

如果我按照上面相同的步骤,当循环在 tag1/tag2 处进行迭代时,它会返回 tag3 和 tag4 的所有注释.

If I follow the same steps as above, when the loop iterates at tag1/tag2, it returns all of the comments for tag3 and tag4.

即:

from lxml import etree

tree = etree.parse("example2.xml")
comments = tree.xpath('tag1[1]/tag2//comment()')
print(comments)

返回

[<!-- comment = \u201cthis is the tag3 comment\u201d-->, <!-- comment = \u201cthis is the tag4 comment\u201d-->]

因此,我的两个问题是:

My two questions are therefore:

  1. 如何只返回直接节点的注释而不包括其任何子节点?
  2. 由于结果以列表的形式返回,如何从所述列表中检索评论的值/文本?

推荐答案

需要指定节点:

tree = etree.parse("example.xml")
comments = tree.xpath('//tag2/comment()')
print(comments)

输出:

[<!-- comment = "this is the tag1 comment"-->]

对于嵌套结构,您需要遍历重复标签:

For your nested structure, you need to iterate over the repeating tags:

tag2Elements = tree.xpath('//tag1/tag2')
for t2 in tag2Elements:
    t3Comment = t2.xpath('tag3/comment()')
    print(t2, t3Comment)

输出:

<Element tag2 at 0x1066b69b0> [<!-- comment = "this is the tag3 comment"-->]
<Element tag2 at 0x1066b6960> [<!-- comment = "this is the tag3 comment"-->]

这篇关于从 Python 中的特定 XML 节点检索注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆