通过使用BeautifulSoup选择所有div兄弟姐妹 [英] Select all div siblings by using BeautifulSoup
问题描述
我有一个html文件,其结构如下:
I have an html file which has a structure like the following:
<div>
</div
<div>
</div>
<div>
<div>
</div>
<div>
</div>
<div>
</div>
<div>
<div>
<div>
</div>
</div>
我想在不选择第三和第四块嵌套div的情况下选择所有同级div.如果使用find_all()
,我会得到所有的div.
I would like to select all the siblings div without selecting nested div in the third and fourth block. If I use find_all()
I get all the divs.
推荐答案
您可以找到父元素的直接个子代:
You can find direct children of the parent element:
soup.select('body > div')
获取顶级body
标签下的所有div
元素.
to get all div
elements under the top-level body
tag.
You could also find the first div
, then grab all matching siblings with Element.find_next_siblings()
:
first_div = soup.find('div')
all_divs = [first_div] + first_div.find_next_siblings('div')
或者您可以使用 element.children
生成器并对它们进行过滤:
Or you could use the element.children
generator and filter those:
all_divs = (elem for elem in top_level.children if getattr(elem, 'name', None) == 'div')
其中top_level
是直接包含这些div
元素的元素.
where top_level
is the element containing these div
elements directly.
这篇关于通过使用BeautifulSoup选择所有div兄弟姐妹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!