如何在lxml中查找element的直接子级 [英] How to find direct children of element in lxml
问题描述
我找到了具有特定类的对象:
I found an object with specific class:
THREAD = TREE.find_class('thread')[0]
现在,我想获取所有直接作为其子元素的<p>
元素.
Now I want to get all <p>
elements that are its direct children.
我很累:
THREAD.findall("p")
THREAD.xpath("//div[@class='thread']/p")
但是所有这些都返回此<div>
中的所有<p>
元素,无论<div>
是否是其最接近的父级.
But all of those returns all <p>
elements inside this <div>
, no matter if that <div>
is their closest parent or not.
我如何使它工作?
示例html:
<div class='thread'>
<p> <!-- 1 -->
<!-- Can be some others <p> objects inside, which should not be counted -->
</p>
<p><!-- 2 --></p>
</div>
<div class='thread'>
<p>[...]</p>
<p>[...]</p>
</div>
脚本应找到两个对象<p>
,它们是THREAD
的子级.我应该收到两个对象的列表,在示例HTML的注释中分别标记为"1"和"2".
script should find two objects <p>
, which are children of THREAD
. I should receive list of two objects, marked as "1" and "2" in comments in sample HTML.
又一次澄清,因为人们感到困惑:
Yet another clarification, since people get confused:
THREAD
是一些存储在变量中的对象,可以是任何html元素.我想找到作为THREAD
的直接子代的<p>
对象.这些<p>
不能在THREAD
之外,也不能在THREAD
内的任何元素之内.
THREAD
is some object stored in variable, can be any html element. I want to find <p>
objects that are direct children of THREAD
. Those <p>
's can not be outside THREAD
or inside any element that's also inside THREAD
.
推荐答案
I'm not sure, but it seem that your problem is in HTML itself: note that there are couple Tag omission cases applicable for p
nodes, so closing tags of paragraphs
<div class='thread'>
<p>first
<p>second</p>
</p>
</div>
解析器会简单地将其忽略,并且两个节点都被标识为同级,而不是父级和子级,例如
simply ignored by parser and both nodes identified as siblings, but not parent and child, e.g.
<div class='thread'>
<p>first
<p>second
</div>
因此XPath //div[@class="thread"]/p
会同时向您返回这两个段落
So XPath //div[@class="thread"]/p
will return you both paragraphs
您只需将p
标签替换为div
标签,您就会看到不同的行为:
You can simply replace p
tags with div
tags and you'll see different behaviour:
<div class='thread'>
<div>first
<div>second</div>
</div>
</div>
此处//div[@class="thread"]/div
仅返回第一个节点
如果我的假设不正确,请纠正我
Please correct me if my assumption is incorrect
这篇关于如何在lxml中查找element的直接子级的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!