BeautifulSoup-具有不同类名的find_all div标签 [英] BeautifulSoup - find_all div tags with different class name

查看:98
本文介绍了BeautifulSoup-具有不同类名的find_all div标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想选择所有<div>,其中类名是post has-profile bg2post has-profile bg1但不是最后一个,即panel

I want to select all <div> where class name is either post has-profile bg2 OR post has-profile bg1 but not last one i.e. panel

<div id="6" class="post has-profile bg2"> some text 1 </div>
<div id="7" class="post has-profile bg1"> some text 2 </div>
<div id="8" class="post has-profile bg2"> some text 3 </div>
<div id="9" class="post has-profile bg1"> some text 4 </div>

<div class="panel bg1" id="abc"> ... </div>

select()仅匹配单个匹配项.我正在尝试使用find_all(),但是bs4无法找到它.

select() is matching only single occurrence. I'm trying it with find_all(), but bs4 is not able to find it.

if soup.find(class_ = re.compile(r"post has-profile [bg1|bg2]")):
    posts = soup.find_all(class_ = re.compile(r"post has-profile [bg1|bg2]"))

如何使用正则表达式和不使用正则表达式来解决?谢谢.

How to solve it with regex and without regex? Thanks.

推荐答案

您可以在BeautifulSoup中使用内置的CSS选择器:

You can use builtin CSS selector within BeautifulSoup:

data = """<div id="6" class="post has-profile bg2"> some text 1 </div>
<div id="7" class="post has-profile bg1"> some text 2 </div>
<div id="8" class="post has-profile bg2"> some text 3 </div>
<div id="9" class="post has-profile bg1"> some text 4 </div>
<div class="panel bg1" id="abc"> ... </div>"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')

divs = soup.select('div.post.has-profile.bg2, div.post.has-profile.bg1')

for div in divs:
    print(div)
    print('-' * 80)

打印:

<div class="post has-profile bg2" id="6"> some text 1 </div>
--------------------------------------------------------------------------------
<div class="post has-profile bg2" id="8"> some text 3 </div>
--------------------------------------------------------------------------------
<div class="post has-profile bg1" id="7"> some text 2 </div>
--------------------------------------------------------------------------------
<div class="post has-profile bg1" id="9"> some text 4 </div>
--------------------------------------------------------------------------------

'div.post.has-profile.bg2, div.post.has-profile.bg1'选择器将选择所有类别为"post hast-profile bg2"<div>标签以及所有类别为"post hast-profile bg1"<div>标签.

The 'div.post.has-profile.bg2, div.post.has-profile.bg1' selector selects all <div> tags with class "post hast-profile bg2" and all <div> tags with class "post hast-profile bg1".

这篇关于BeautifulSoup-具有不同类名的find_all div标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆