使用BeautifulSoup搜索具有多个空格的类标签和通配符 [英] Searching on class tags with multiple spaces and wildcards with BeautifulSoup

查看:437
本文介绍了使用BeautifulSoup搜索具有多个空格的类标签和通配符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用BeautifulSoup查找具有以"foo bar"开头的类属性的所有div容器.我曾希望以下方法能起作用:

from bs4 import BeautifulSoup

import re

soup.findAll('div',class_=re.compile('^foo bar'))

但是,似乎类定义被分成了一个列表,例如['foo','bar'],使得正则表达式无法完成我的任务.有没有办法可以完成此任务? (我已经审查了许多其他帖子,但没有找到可行的解决方案)

解决方案

您可以将语法与


res = soup.find_all('div', class_=lambda s:s.startswith('foo bar')) # without space
print(res)
>>> [<div class="foo bar bing"></div>, <div class="foo bar1 bang"></div>]


具有功能的另一种可能的语法:

def is_a_match(clas):
    return clas.startswith('foo bar')

res = soup.find_all('div', class_=is_a_match)

也许这个答案也可以为您提供帮助: https://stackoverflow.com/a/46719313/6655211

I am trying to use BeautifulSoup to find all div containers with the class attribute beginning by "foo bar". I had hoped the following would work:

from bs4 import BeautifulSoup

import re

soup.findAll('div',class_=re.compile('^foo bar'))

However, it seems that the class definition is separated into a list, like ['foo','bar'], such that regular expressions are not able to accomplish my task. Is there a way I can accomplish this task? (I have reviewed a number of other posts, but have not found a working solution)

解决方案

You can use a syntax with a function that needs to return True or False, a lambda can do the trick too:

from bs4 import BeautifulSoup as soup
html = '''
<div class="foo bar bing"></div>
<div class="foo bang"></div>
<div class="foo bar1 bang"></div>
'''
soup = soup(html, 'lxml')
res = soup.find_all('div', class_=lambda s:s.startswith('foo bar '))
print(res)
>>> [<div class="foo bar bing"></div>]


res = soup.find_all('div', class_=lambda s:s.startswith('foo bar')) # without space
print(res)
>>> [<div class="foo bar bing"></div>, <div class="foo bar1 bang"></div>]


Another possible syntax with a function :

def is_a_match(clas):
    return clas.startswith('foo bar')

res = soup.find_all('div', class_=is_a_match)

Maybe this answer can help you too : https://stackoverflow.com/a/46719313/6655211

这篇关于使用BeautifulSoup搜索具有多个空格的类标签和通配符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆