抽取数据使用Python和美丽的汤体育表 [英] Scrape Data From a sports table using Python and Beautiful soup
问题描述
我想从表中刮中的数据 - 即( http://stats.nba.com/leagueTeamGeneral.html?pageNo=1&rowsPerPage=30 )。我有使用正确的命令困难。试过各种参数,没有工作。理想的情况是有格式返回的数据,
例:
亚特兰大老鹰,32,48.8,18,14,0.563,等结果
我能得到的数据格式没有问题,刚开所需的数据是什么原因造成了我的悲伤。
进口的urllib2
从BS4进口BeautifulSoup 页='http://stats.nba.com/leagueTeamGeneral.html?pageNo=1&rowsPerPage=30
页= urllib2.urlopen(页)
汤= BeautifulSoup(页)
在soup.find_all DS(???):
打印(dS.get(???))
使用Firefox一样萤火虫的工具来跟踪的HTML调用你的需要,看你在Firebug共享链接网选项卡显示的数据你后在后续请求调用 http://www.nba.com/cmsinclude/desktopWrapperHeader_jsonp.html
实际上包含JSON数据,不知道BeautifulSoup就会得心应手这里,尽量使用Python加载它 JSON
I'm trying to scrape the data from a table - namely (http://stats.nba.com/leagueTeamGeneral.html?pageNo=1&rowsPerPage=30). I am having difficulty with using the right commands. Tried various parameters, none worked. Ideally having the data returned in the format,
example:
Atlanta Hawks,32, 48.8, 18, 14, .563, etc
I can get the data formatted no problem, just getting the required data is what is causing me grief.
import urllib2
from bs4 import BeautifulSoup
page = 'http://stats.nba.com/leagueTeamGeneral.html?pageNo=1&rowsPerPage=30'
page = urllib2.urlopen(page)
soup = BeautifulSoup(page)
for dS in soup.find_all(???):
print(dS.get(???))
use a tool like firefox firebug to track down the html call you need, looking at the link you shared in firebug 'net' tab shows that the data you are after is in a subsequent request call to http://www.nba.com/cmsinclude/desktopWrapperHeader_jsonp.html
which actually contains json data, not sure BeautifulSoup will be handy here, try to load it using python json
这篇关于抽取数据使用Python和美丽的汤体育表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!