BeautifulSoup-如何单独查找特定的类名称 [英] BeautifulSoup - How to find a specific class name alone
问题描述
如何找到具有特定类名而不是其他类名的li
标记?例如:
How to find the li
tags with a specific class name but not others? For example:
...
<li> no wanted </li>
<li class="a"> not his one </li>
<li class="a z"> neither this one </li>
<li class="b z"> neither this one </li>
<li class="c z"> neither this one </li>
...
<li class="z"> I WANT THIS ONLY ONE</li>
...
代码:
bs4.find_all ('li', class_='z')
返回多个条目,其中有一个"z"
和另一个类名.
bs4.find_all ('li', class_='z')
returns several entries where there is a "z"
and another class name.
如何单独查找类名称为"z"
的条目?
How to find the entry with the class name "z"
, alone ?
推荐答案
您可以使用 CSS选择器以与确切的类名匹配.
You can use CSS selectors to match the exact class name.
html = '''<li> no wanted </li>
<li class="a"> not his one </li>
<li class="a z"> neither this one </li>
<li class="b z"> neither this one </li>
<li class="c z"> neither this one </li>
<li class="z"> I WANT THIS ONLY ONE</li>'''
soup = BeautifulSoup(html, 'lxml')
tags = soup.select('li[class="z"]')
print(tags)
使用lambda
可以达到相同的结果:
The same result can be achieved using lambda
:
tags = soup.find_all(lambda tag: tag.name == 'li' and tag.get('class') == ['z'])
输出:
[<li class="z"> I WANT THIS ONLY ONE</li>]
看看多值属性.您会理解为什么class_='z'
匹配所有在类名中具有z
的标签.
Have a look at Multi-valued attributes. You'll understand why class_='z'
matches all the tags that have z
in their class name.
HTML 4定义了一些可以具有多个值的属性. HTML 5删除了其中的几个,但还定义了其他几个.最常见的多值属性是
class
(即,标记可以具有多个CSS类).其他包括rel
,rev
,accept-charset
,headers
和accesskey
. 美丽的汤"以列表形式显示多值属性的值:
HTML 4 defines a few attributes that can have multiple values. HTML 5 removes a couple of them, but defines a few more. The most common multi-valued attribute is
class
(that is, a tag can have more than one CSS class). Others includerel
,rev
,accept-charset
,headers
, andaccesskey
. Beautiful Soup presents the value(s) of a multi-valued attribute as a list:
css_soup = BeautifulSoup('<p class="body"></p>')
css_soup.p['class']
# ["body"]
css_soup = BeautifulSoup('<p class="body strikeout"></p>')
css_soup.p['class']
# ["body", "strikeout"]
这篇关于BeautifulSoup-如何单独查找特定的类名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!