在BeautifulSoup中选择除具有某些类的div以外的所有div [英] Select all divs except ones with certain classes in BeautifulSoup

查看:53
本文介绍了在BeautifulSoup中选择除具有某些类的div以外的所有div的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题中讨论的轻松获取具有特定类的所有 div 。但是在这里,我有一个要排除的类列表。想要获取列表中未指定任何类别的所有div。

As discussed in this question one can easily get all divs with certain classes. But here, I have a list of classes that I want to exclude & want to get all divs that doesn't have any class given in the list.

例如,

classToIgnore = ["class1", "class2", "class3"]



<现在要获取所有不包含上面提到的类的div。我该如何实现?

Now want to get all divs that doesn't contains the classes mentioned above list. How can i achieve that?

推荐答案

替代解决方案

soup.find_all('div', class_=lambda x: x not in classToIgnore)

示例

from bs4 import BeautifulSoup
html = """
<div class="c1"></div>
<div class="c1"></div>
<div class="c2"></div>
<div class="c3"></div>
<div class="c4"></div>
"""
soup = BeautifulSoup(html, 'html.parser')
classToIgnore = ["c1", "c2"]
print(soup.find_all('div', class_=lambda x: x not in classToIgnore))

输出

[<div class="c3"></div>, <div class="c4"></div>]

如果要处理嵌套类,请尝试使用 分解 ,然后只是 find_all('div')

If you are dealing with nested classes then try deleting the inner unwanted classes using decompose and then just find_all('div')

for div in soup.find_all('div', class_=lambda x: x in classToIgnore):
    div.decompose()
print(soup.find_all('div'))

这可能会保留一些额外的空间,但您以后可以轻松删除它。

This might leave some extra spaces but you can strip that off easily later.

这篇关于在BeautifulSoup中选择除具有某些类的div以外的所有div的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆