在BeautifulSoup中选择除具有某些类的div以外的所有div [英] Select all divs except ones with certain classes in BeautifulSoup
问题描述
此问题中讨论的轻松获取具有特定类的所有 div
。但是在这里,我有一个要排除的类列表。想要获取列表中未指定任何类别的所有div。
As discussed in this question one can easily get all div
s with certain classes. But here, I have a list of classes that I want to exclude & want to get all divs that doesn't have any class given in the list.
例如,
classToIgnore = ["class1", "class2", "class3"]
<现在要获取所有不包含上面提到的类的div。我该如何实现?
Now want to get all divs that doesn't contains the classes mentioned above list. How can i achieve that?
推荐答案
替代解决方案
soup.find_all('div', class_=lambda x: x not in classToIgnore)
示例
from bs4 import BeautifulSoup
html = """
<div class="c1"></div>
<div class="c1"></div>
<div class="c2"></div>
<div class="c3"></div>
<div class="c4"></div>
"""
soup = BeautifulSoup(html, 'html.parser')
classToIgnore = ["c1", "c2"]
print(soup.find_all('div', class_=lambda x: x not in classToIgnore))
输出
[<div class="c3"></div>, <div class="c4"></div>]
如果要处理嵌套类,请尝试使用 分解 ,然后只是 find_all('div')
If you are dealing with nested classes then try deleting the inner unwanted classes using decompose and then just find_all('div')
for div in soup.find_all('div', class_=lambda x: x in classToIgnore):
div.decompose()
print(soup.find_all('div'))
这可能会保留一些额外的空间,但您以后可以轻松删除它。
This might leave some extra spaces but you can strip that off easily later.
这篇关于在BeautifulSoup中选择除具有某些类的div以外的所有div的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!