Python-美丽的汤4新手入门-onmouseover [英] Python - Beginner Scraping with Beautiful Soup 4 - onmouseover

查看:55
本文介绍了Python-美丽的汤4新手入门-onmouseover的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python初学者(3)的用户,我目前正在尝试获取我的梦幻足球赛季的一些体育数据.以前,我是通过多种方式进行此操作的(以HT轨道下载,转换为excel并使用VBA合并我的数据).但是现在我正在尝试学习python以提高我的编码能力.

i'm a beginner python (3) user and i'm currently trying to scrape some sports stats for my fantasy football season. Previously I did this in a round about way (downloading in HT-track, converting to excel and using VBA to combine my data). But now I'm trying to learn python to improve my coding abilities.

我想抓取此页面,但在仅选择我想要的行/表时遇到了一些困难.这是我的代码当前的样子.我一直在尝试使用它的地方还有一些代码.

I want to scrape this page but running into some difficulty in selecting only the rows/tables I want. Here is how my code currently stands. It still has a bit of code where I have been trying to play around with it.

from urllib.request import urlopen  # import the library
from bs4 import BeautifulSoup   # Import BS
from bs4 import SoupStrainer    # Import Soup Strainer

page = urlopen('http://www.footywire.com/afl/footy/ft_match_statistics?mid=6172') # access the website
only_tables = SoupStrainer('table') # parse only table elements when parsing
soup = BeautifulSoup(page, 'html.parser')   # parse the html


# for row in soup('table',{'class':'tbody'}[0].tbody('tr')):
#   tds = row('td')
#   print (tds[0].string, tds[1].string)

# create variables to keep the data in

team = []
player = []
kicks = []
handballs = []
disposals = []
marks = []
goals = []
tackles = []
hitouts = []
inside50s = []
freesfor = []
freesagainst = []
fantasy = []
supercoach = []

table = soup.find_all('tr')


# print(soup.prettify())

print(table)

现在我可以从页面中选择所有'tr',但是我只能选择具有以下属性的行有麻烦:

Right now I can select all 'tr' from the page, however I'm having trouble only selecting the rows which have the following attribute:

<tr bgcolor="#ffffff" onmouseout="this.bgColor='#ffffff';" onmouseover="this.bgColor='#cbcdd0';">

"onmouseover"似乎是我要抓取的表的唯一/唯一属性.

"onmouseover" seems to be the only attribute which is common/unique to the table I want to scrape.

有人知道我可以如何更改此行代码以选择此属性?

Does anyone know how I can alter this line of code, to select this attribute?

table = soup.find_all('tr')

我有信心在这里将数据放入一个数据框中,希望可以将其导出为CSV.

From here I am confident I can place the data into a dataframe which hopefully I can export to CSV.

我很幸运地浏览了BS4文档,对我们的帮助将不胜感激.

Any help would be greatly appreciated as I have looked through the BS4 documentation with no luck.

推荐答案

您可以使用此:

table = soup.findAll("tr", {"bgcolor": "#ffffff", "onmouseout": "this.bgColor='#ffffff'", "onmouseover": "this.bgColor='#cbcdd0';"})

更多,您还可以使用以下方法:

More, you can also use the following approach:

tr_tag = soup.findAll(lambda tag:tag.name == "tr" and tag["bgcolor"] == "#ffffff") and tag["onmouseout"] = "this.bgColor='#ffffff'" and tag["onmouseover"] = "this.bgColor='#cbcdd0';"

上述方法的优势在于,它利用了BS的全部功能,并且以非常优化的方式为您提供结果

The advantage of the above approach is that it uses the full power of BS and it's giving you the result in a very optimized way

这篇关于Python-美丽的汤4新手入门-onmouseover的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆