如何使用来自NBA.com的数据? [英] How to work with data from NBA.com?

查看:181
本文介绍了如何使用来自NBA.com的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从 nba.com 发现了Greg Reda的博客文章:



http:// www.gregreda.com/2015/02/15/web-scraping-finding-the-api/

我试图使用他写的代码有:

 导入请求
导入json

url ='http:// stats .nba.com / stats / leaguedashteamshotlocations?Conference =& DateFr'+ \
'om =& DateTo =& DistanceRange = By + Zone& Division =& GameScope =&GameSegment =& LastN '+ \
'游戏= 0&联盟ID = 00和位置=&测量类型=对手和月= 0和对手技术ID'+ \
'= 0&结果=& PORound = 0& PaceAdjust = N& PerMode = PerGame& Period = 0& PlayerExperien'+ \
'ce =& PlayerPosition =& PlusMinus = N& Rank = N& Season = 2014-15& SeasonSegment = & Seas'+ \
'onType = Regular + Season& ShotClockRange =& StarterBench =& TeamID = 0& VsConference =& VsDivision ='

response = requests.get (url)
response.raise_for_status()
shots = response.json()['resultSets'] ['rowSet']

avg_percentage = shots ['OPP_FG_PCT']

print(avg_percentage)

但它会返回:

  Traceback(最近一次调用最后一次):
文件C:\Python34\\\
ba.py,第91行,位于<模块>
avg_percentage = shots ['OPP_FG_PCT']
TypeError:列表索引必须是整数,而不是str

我只知道基本的Python,因此我无法弄清楚如何从数据中获取整数列表。有人可以解释吗?

解决方案

很明显,自从Greg Reda写这篇文章后,数据结构发生了变化。在浏览数据之前,我建议您通过酸洗将其保存到文件中。这样,你不必一直保持击中NBA服务器,并在每次修改和重新运行脚本时等待下载。



以下脚本检查是否存在腌制的数据,以避免不必要的下载:

 导入请求
导入json

url = 'http://stats.nba.com/stats/leaguedashteamshotlocations?Conference=&DateFr'+ \
'om =& DateTo =& DistanceRange = By + Zone& Division =& GameScope =& amp ; GameSegment =& LastN'+ \
'Games = 0& LeagueID = 00& Location =& MeasureType = Opponent& Month = 0& OpponentTeamID'+ \
'= 0& Outcome = & PORound = 0& PaceAdjust = N& PerMode = PerGame& Period = 0& PlayerExperien'+ \
'ce =& PlayerPosition =&PlusMinus = N& Rank = N& Season = 2014-15& SeasonSegment =& Seas'+ \
'onType = Regular + Season& ShotClockRange =& StarterBench =& TeamID = 0& V sConference =& VsDivision ='
print(url)

import sys,os,pickle
file_name ='result_sets.pickled'

if os .path.isfile(file_name):
result_sets = pickle.load(open(file_name,'rb'))
else:
response = requests.get(url)
response .raise_for_status()
result_sets = response.json()['resultSets']
pickle.dump(result_sets,open(file_name,'wb'))

print(result_sets print(result_sets ['rowSet'] [0])
print(len(result_sets ['rowSet' ]))

一旦您有 result_sets 手,你可以检查数据。如果你打印它,你会看到它是一本字典。您可以提取字典键:

  print(result_sets.keys())

目前键是'headers''rowSet' code>和'name'。您可以检查标题:

  print(result_sets ['headers'])

我对这些统计数据的了解可能比您少。但是,通过查看数据,我可以发现 result_sets ['rowSet'] 包含30行,每行23个元素。 23列由 result_sets ['headers'] [1] 标识。试试这个:

$ p $ print(result_sets ['headers'] [1])$ ​​b $ b

这将显示23列名称。现在看看第一行的团队数据:

  print(result_sets ['rowSet'] [0])

现在您会看到为亚特兰大老鹰队报告的23个值。您可以迭代 result_sets ['rowSet'] 中的行,以提取您感兴趣的任何值并计算汇总信息,例如总计和平均值。


I found Greg Reda's blog post about scraping HTML from nba.com:

http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/

I tried to work with the code he wrote there:

import requests
import json

url = 'http://stats.nba.com/stats/leaguedashteamshotlocations?Conference=&DateFr' + \
      'om=&DateTo=&DistanceRange=By+Zone&Division=&GameScope=&GameSegment=&LastN' + \
      'Games=0&LeagueID=00&Location=&MeasureType=Opponent&Month=0&OpponentTeamID' + \
      '=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperien' + \
      'ce=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2014-15&SeasonSegment=&Seas' + \
      'onType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision='

response = requests.get(url)
response.raise_for_status()
shots = response.json()['resultSets']['rowSet']

avg_percentage = shots['OPP_FG_PCT']

print(avg_percentage)

But it returns:

Traceback (most recent call last):
  File "C:\Python34\nba.py", line 91, in <module>
    avg_percentage = shots['OPP_FG_PCT']
TypeError: list indices must be integers, not str

I know only basic Python so I couldn't figure out how to get a list of integers from the data. Can anybody explain?

解决方案

Evidently the data structure has changed since Greg Reda wrote that post. Before exploring the data, I recommend that you save it to a file via pickling. That way you don't have to keep hitting the NBA server and waiting for a download each time you modify and rerun the script.

The following script checks for the existence of the pickled data to avoid unnecessary downloading:

import requests
import json

url = 'http://stats.nba.com/stats/leaguedashteamshotlocations?Conference=&DateFr' + \
      'om=&DateTo=&DistanceRange=By+Zone&Division=&GameScope=&GameSegment=&LastN' + \
      'Games=0&LeagueID=00&Location=&MeasureType=Opponent&Month=0&OpponentTeamID' + \
      '=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperien' + \
      'ce=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2014-15&SeasonSegment=&Seas' + \
      'onType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision='
print(url)

import sys, os, pickle
file_name = 'result_sets.pickled'

if os.path.isfile(file_name):
  result_sets = pickle.load(open(file_name, 'rb'))
else: 
  response = requests.get(url)
  response.raise_for_status()
  result_sets = response.json()['resultSets']
  pickle.dump(result_sets, open(file_name, 'wb'))

print(result_sets.keys())
print(result_sets['headers'][1])
print(result_sets['rowSet'][0])
print(len(result_sets['rowSet']))

Once you have result_sets in hand, you can examine the data. If you print it, you'll see that it's a dictionary. You can extract the dictionary keys:

print(result_sets.keys())

Currently the keys are 'headers', 'rowSet', and 'name'. You can inspect the headers:

print(result_sets['headers'])

I probably know less about these statistics than you do. However, by looking at the data, I've been able to figure out that result_sets['rowSet'] contains 30 rows of 23 elements each. The 23 columns are identified by result_sets['headers'][1]. Try this:

print(result_sets['headers'][1])

That will show you the 23 column names. Now take a look at the first row of team data:

print(result_sets['rowSet'][0])

Now you see the 23 values reported for the Atlanta Hawks. You can iterate over the rows in result_sets['rowSet'] to extract whatever values interest you and to compute aggregate information such as totals and averages.

这篇关于如何使用来自NBA.com的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆