在 XML 中查找元素兄弟的最 Pythonic 方法 [英] Most Pythonic way to find the sibling of an element in XML

查看:30
本文介绍了在 XML 中查找元素兄弟的最 Pythonic 方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:我有以下 XML 片段:

Problem: I have the following XML snippet:

...snip...
<p class="p_cat_heading">DEFINITION</p>
<p class="p_numberedbullet"><span class="calibre10">This</span>, <span class="calibre10">these</span>. </p>
<p class="p_cat_heading">PRONUNCIATION </p>
..snip...

我需要搜索整个 XML,找到包含文本 DEFINITION 的标题,并打印相关的定义.定义的格式不一致并且可以更改属性/元素,因此捕获所有内容的唯一可靠方法是读取具有属性 p_cat_heading 的下一个元素.

I need to search the totality of the XML, find the heading that has text DEFINITION, and print the associated definitions. The format of the definitions is not consistent and can change attributes/elements so the only reliable way of capturing all of it is to read until the next element with attribute p_cat_heading.

现在我正在使用以下代码来查找所有标题:

Right now I am using the following code to find all of the headers:

for heading in root.findall(".//*[@class='p_cat_heading']"):
    if heading.text == "DEFINITION":
        <WE FOUND THE CORRECT HEADER - TAKE ACTION HERE>

我尝试过的事情:

  • 使用 lxml 的 getnext 方法.这将获取下一个具有p_cat_heading"属性的兄弟,这不是我想要的.
  • following_sibling - lxml 应该支持这一点,但它抛出在前缀映射中找不到以下兄弟姐妹"

我的解决方案:

我还没有完成,但是因为我的 XML 很短,所以我只想获取所有元素的列表,迭代直到具有 DEFINITION 属性的元素,然后迭代直到具有 p_cat_heading 属性的下一个元素.这个解决方案既可怕又丑陋,但我似乎找不到干净的替代方案.

I haven't finished it, but because my XML is short I was just going to get a list of all elements, iterate until the one with the DEFINITION attribute, and then iterate until the next element with the p_cat_heading attribute. This solution is horrible and ugly, but I can't seem to find a clean alternative.

我在寻找什么:

在我们的例子中,一种更 Pythonic 的打印定义的方式,即这个,这些".解决方案可以使用 xpath 或其他替代方法.首选 Python 原生解决方案,但什么都行.

A more Pythonic way of printing the definition which is "this, these" in our case. Solution may use either xpath or some alternative. Python-native solutions preferred, but anything will do.

推荐答案

有几种方法可以做到这一点,但是通过依赖 xpath 来完成大部分工作,这个表达式

There are a couple of ways of doing this, but by relying on xpath to do most of the work, this expression

//*[@class='p_cat_heading'][contains(text(),'DEFINITION')]/following-sibling::*[1]

应该可以.

使用 lxml:

from lxml import html

data = [your snippet above]
exp = "//*[@class='p_cat_heading'][contains(text(),'DEFINITION')]/following-sibling::*[1]"

tree = html.fromstring(data) 
target = tree.xpath(exp)

for i in target:
    print(i.text_content())

输出:

这个,这些.

这篇关于在 XML 中查找元素兄弟的最 Pythonic 方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆