抽取数据使用Python和美丽的汤体育表 [英] Scrape Data From a sports table using Python and Beautiful soup

查看:147
本文介绍了抽取数据使用Python和美丽的汤体育表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从表中刮中的数据 - 即( http://stats.nba.com/leagueTeamGeneral.html?pageNo=1&rowsPerPage=30 )。我有使用正确的命令困难。试过各种参数,没有工作。理想的情况是有格式返回的数据,
例:
亚特兰大老鹰,32,48.8,18,14,0.563,等结果
我能得到的数据格式没有问题,刚开所需的数据是什么原因造成了我的悲伤。

 进口的urllib2
    从BS4进口BeautifulSoup    页='http://stats.nba.com/leagueTeamGeneral.html?pageNo=1&rowsPerPage=30
    页= urllib2.urlopen(页)
    汤= BeautifulSoup(页)
    在soup.find_all DS(???):
        打印(dS.get(???))


解决方案

使用Firefox一样萤火虫的工具来跟踪的HTML调用你的需要,看你在Firebug共享链接网选项卡显示的数据你后在后续请求调用 http://www.nba.com/cmsinclude/desktopWrapperHeader_jsonp.html
实际上包含JSON数据,不知道BeautifulSoup就会得心应手这里,尽量使用Python加载它 JSON

I'm trying to scrape the data from a table - namely (http://stats.nba.com/leagueTeamGeneral.html?pageNo=1&rowsPerPage=30). I am having difficulty with using the right commands. Tried various parameters, none worked. Ideally having the data returned in the format, example: Atlanta Hawks,32, 48.8, 18, 14, .563, etc
I can get the data formatted no problem, just getting the required data is what is causing me grief.

    import urllib2
    from bs4 import BeautifulSoup

    page = 'http://stats.nba.com/leagueTeamGeneral.html?pageNo=1&rowsPerPage=30'
    page = urllib2.urlopen(page)
    soup = BeautifulSoup(page)
    for dS in soup.find_all(???):
        print(dS.get(???))

解决方案

use a tool like firefox firebug to track down the html call you need, looking at the link you shared in firebug 'net' tab shows that the data you are after is in a subsequent request call to http://www.nba.com/cmsinclude/desktopWrapperHeader_jsonp.html which actually contains json data, not sure BeautifulSoup will be handy here, try to load it using python json

这篇关于抽取数据使用Python和美丽的汤体育表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆