在 XML 中查找元素兄弟的最 Pythonic 方法 [英] Most Pythonic way to find the sibling of an element in XML
问题描述
问题:我有以下 XML 片段:
Problem: I have the following XML snippet:
...snip...
<p class="p_cat_heading">DEFINITION</p>
<p class="p_numberedbullet"><span class="calibre10">This</span>, <span class="calibre10">these</span>. </p>
<p class="p_cat_heading">PRONUNCIATION </p>
..snip...
我需要搜索整个 XML,找到包含文本 DEFINITION
的标题,并打印相关的定义.定义的格式不一致并且可以更改属性/元素,因此捕获所有内容的唯一可靠方法是读取具有属性 p_cat_heading
的下一个元素.
I need to search the totality of the XML, find the heading that has text DEFINITION
, and print the associated definitions. The format of the definitions is not consistent and can change attributes/elements so the only reliable way of capturing all of it is to read until the next element with attribute p_cat_heading
.
现在我正在使用以下代码来查找所有标题:
Right now I am using the following code to find all of the headers:
for heading in root.findall(".//*[@class='p_cat_heading']"):
if heading.text == "DEFINITION":
<WE FOUND THE CORRECT HEADER - TAKE ACTION HERE>
我尝试过的事情:
- 使用 lxml 的 getnext 方法.这将获取下一个具有p_cat_heading"属性的兄弟,这不是我想要的.
- following_sibling - lxml 应该支持这一点,但它抛出在前缀映射中找不到以下兄弟姐妹"
我的解决方案:
我还没有完成,但是因为我的 XML 很短,所以我只想获取所有元素的列表,迭代直到具有 DEFINITION 属性的元素,然后迭代直到具有 p_cat_heading 属性的下一个元素.这个解决方案既可怕又丑陋,但我似乎找不到干净的替代方案.
I haven't finished it, but because my XML is short I was just going to get a list of all elements, iterate until the one with the DEFINITION attribute, and then iterate until the next element with the p_cat_heading attribute. This solution is horrible and ugly, but I can't seem to find a clean alternative.
我在寻找什么:
在我们的例子中,一种更 Pythonic 的打印定义的方式,即这个,这些".解决方案可以使用 xpath 或其他替代方法.首选 Python 原生解决方案,但什么都行.
A more Pythonic way of printing the definition which is "this, these" in our case. Solution may use either xpath or some alternative. Python-native solutions preferred, but anything will do.
推荐答案
有几种方法可以做到这一点,但是通过依赖 xpath 来完成大部分工作,这个表达式
There are a couple of ways of doing this, but by relying on xpath to do most of the work, this expression
//*[@class='p_cat_heading'][contains(text(),'DEFINITION')]/following-sibling::*[1]
应该可以.
使用 lxml:
from lxml import html
data = [your snippet above]
exp = "//*[@class='p_cat_heading'][contains(text(),'DEFINITION')]/following-sibling::*[1]"
tree = html.fromstring(data)
target = tree.xpath(exp)
for i in target:
print(i.text_content())
输出:
这个,这些.
这篇关于在 XML 中查找元素兄弟的最 Pythonic 方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!