美丽的汤findall使用一个查询的多个类 [英] beautiful soup findall multiple class using one query

查看:52
本文介绍了美丽的汤findall使用一个查询的多个类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在许多网站上和此处彻底搜索了解决方案,但是没有一个起作用!

I searched thoroughly for solution on many websites and on here but none of them works!

我正在尝试抓取flashscores.com,我想解析类名称为cell_ab team-homecell_ab team-home bold

I am trying to scrape flashscores.com and i want to parse a <td> with the class name cell_ab team-home or cell_ab team-home bold

我尝试使用re

soup.find_all('td', { 'class'= re.compile(r"^(cell_ab team-home |cell_ab team-home  bold )$"))

soup.find_all('td', { 'class' : ['cell_ab team-home ','cell_ab team-home  bold '])

它们都不起作用.

有人要求提供验证码,所以就在这里

someone requested for the codes so here it is

 from tkinter import *
 from selenium import webdriver
 import threading
 from bs4 import BeautifulSoup

 browser = webdriver.Firefox()
 browser.get('http://www.flashscore.com/')

 HTML = browser.page_source
 soap = BeautifulSoup(HTML)
 for item in soap.find_all('td', class_ = ['cell_ab team-home ','cell_ab team-home  bold ']):
        Listbox.insert(END,item.text)

推荐答案

bs4文档说了以下有关使用class_进行匹配的信息:

The bs4 documentation says the following about matching using class_:

请记住,单个标签的class属性可以具有多个值.当您搜索与某个CSS类匹配的标签时,即表示它与任何CSS类都匹配.

Remember that a single tag can have multiple values for its class attribute. When you search for a tag that matches a certain CSS class, you’re matching against any of its CSS classes.


根据文档,您必须在此处使用.select方法使用CSS选择器.因此,类似这样的事情应该可以解决问题:


According to the documentation, you'd have to use CSS selectors here, with the .select method. Thus something like this ought to do the trick:

soup.select('td.cell_ab.team-home')

这将选择同时设置了cell_abteam-home类的所有<td>,包括具有其他类(例如bold)的<td>.

This would select all <td>s that have both cell_ab and team-home classes set, including <td>s that have additional classes, such as bold.

这篇关于美丽的汤findall使用一个查询的多个类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆