BeautifulSoup-如何单独查找特定的类名称 [英] BeautifulSoup - How to find a specific class name alone

查看:807
本文介绍了BeautifulSoup-如何单独查找特定的类名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何找到具有特定类名而不是其他类名的li标记?例如:

How to find the li tags with a specific class name but not others? For example:

...
<li> no wanted </li>
<li class="a"> not his one </li>
<li class="a z"> neither this one </li>
<li class="b z"> neither this one </li>
<li class="c z"> neither this one </li>
...
<li class="z"> I WANT THIS ONLY ONE</li>
...

代码:

bs4.find_all ('li', class_='z') 返回多个条目,其中有一个"z"和另一个类名.

bs4.find_all ('li', class_='z') returns several entries where there is a "z" and another class name.

如何单独查找类名称为"z"的条目?

How to find the entry with the class name "z", alone ?

推荐答案

您可以使用 CSS选择器以与确切的类名匹配.

You can use CSS selectors to match the exact class name.

html = '''<li> no wanted </li>
<li class="a"> not his one </li>
<li class="a z"> neither this one </li>
<li class="b z"> neither this one </li>
<li class="c z"> neither this one </li>
<li class="z"> I WANT THIS ONLY ONE</li>'''

soup = BeautifulSoup(html, 'lxml')

tags = soup.select('li[class="z"]')
print(tags)

使用lambda可以达到相同的结果:

The same result can be achieved using lambda:

tags = soup.find_all(lambda tag: tag.name == 'li' and tag.get('class') == ['z'])

输出:

[<li class="z"> I WANT THIS ONLY ONE</li>]


看看多值属性.您会理解为什么class_='z'匹配所有在类名中具有z的标签.


Have a look at Multi-valued attributes. You'll understand why class_='z' matches all the tags that have z in their class name.

HTML 4定义了一些可以具有多个值的属性. HTML 5删除了其中的几个,但还定义了其他几个.最常见的多值属性是class(即,标记可以具有多个CSS类).其他包括relrevaccept-charsetheadersaccesskey. 美丽的汤"以列表形式显示多值属性的值:

HTML 4 defines a few attributes that can have multiple values. HTML 5 removes a couple of them, but defines a few more. The most common multi-valued attribute is class (that is, a tag can have more than one CSS class). Others include rel, rev, accept-charset, headers, and accesskey. Beautiful Soup presents the value(s) of a multi-valued attribute as a list:

css_soup = BeautifulSoup('<p class="body"></p>')
css_soup.p['class']
# ["body"]

css_soup = BeautifulSoup('<p class="body strikeout"></p>')
css_soup.p['class']
# ["body", "strikeout"]

这篇关于BeautifulSoup-如何单独查找特定的类名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆