美丽的汤findall使用一个查询的多个类 [英] beautiful soup findall multiple class using one query
问题描述
我在许多网站上和此处彻底搜索了解决方案,但是没有一个起作用!
I searched thoroughly for solution on many websites and on here but none of them works!
我正在尝试抓取flashscores.com,我想解析类名称为cell_ab team-home
或cell_ab team-home bold
I am trying to scrape flashscores.com and i want to parse a <td>
with the class name cell_ab team-home
or cell_ab team-home bold
我尝试使用re
soup.find_all('td', { 'class'= re.compile(r"^(cell_ab team-home |cell_ab team-home bold )$"))
和
soup.find_all('td', { 'class' : ['cell_ab team-home ','cell_ab team-home bold '])
它们都不起作用.
有人要求提供验证码,所以就在这里
someone requested for the codes so here it is
from tkinter import *
from selenium import webdriver
import threading
from bs4 import BeautifulSoup
browser = webdriver.Firefox()
browser.get('http://www.flashscore.com/')
HTML = browser.page_source
soap = BeautifulSoup(HTML)
for item in soap.find_all('td', class_ = ['cell_ab team-home ','cell_ab team-home bold ']):
Listbox.insert(END,item.text)
推荐答案
bs4
文档说了以下有关使用class_
进行匹配的信息:
The bs4
documentation says the following about matching using class_
:
请记住,单个标签的
class
属性可以具有多个值.当您搜索与某个CSS类匹配的标签时,即表示它与任何CSS类都匹配.
Remember that a single tag can have multiple values for its
class
attribute. When you search for a tag that matches a certain CSS class, you’re matching against any of its CSS classes.
根据文档,您必须在此处使用.select
方法使用CSS选择器.因此,类似这样的事情应该可以解决问题:
According to the documentation, you'd have to use CSS selectors here, with the .select
method. Thus something like this ought to do the trick:
soup.select('td.cell_ab.team-home')
这将选择同时设置了cell_ab
和team-home
类的所有<td>
,包括具有其他类(例如bold
)的<td>
.
This would select all <td>
s that have both cell_ab
and team-home
classes set, including <td>
s that have additional classes, such as bold
.
这篇关于美丽的汤findall使用一个查询的多个类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!