如何使用来自NBA.com的数据？ [英] How to work with data from NBA.com?

查看：181 发布时间：2018/6/21 13:52:43 python html web-scraping text-processing string-parsing

本文介绍了如何使用来自NBA.com的数据？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从 nba.com 发现了Greg Reda的博客文章：

：

http：// www.gregreda.com/2015/02/15/web-scraping-finding-the-api/

我试图使用他写的代码有：

 导入请求
导入json 
 
 url ='http：// stats .nba.com / stats / leaguedashteamshotlocations？Conference =& DateFr'+ \ 
'om =& DateTo =& DistanceRange = By + Zone& Division =& GameScope =&GameSegment =& LastN '+ \ 
'游戏= 0&联盟ID = 00和位置=&测量类型=对手和月= 0和对手技术ID'+ \ 
'= 0&结果=& PORound = 0& PaceAdjust = N& PerMode = PerGame& Period = 0& PlayerExperien'+ \ 
'ce =& PlayerPosition =& PlusMinus = N& Rank = N& Season = 2014-15& SeasonSegment = & Seas'+ \ 
'onType = Regular + Season& ShotClockRange =& StarterBench =& TeamID = 0& VsConference =& VsDivision ='
 
 response = requests.get （url）
 response.raise_for_status（）
 shots = response.json（）['resultSets'] ['rowSet'] 
 
 avg_percentage = shots ['OPP_FG_PCT'] 
 
 print（avg_percentage）

但它会返回：

  Traceback（最近一次调用最后一次）：
文件C：\Python34\\\
ba.py，第91行，位于<模块> 
 avg_percentage = shots ['OPP_FG_PCT'] 
 TypeError：列表索引必须是整数，而不是str

我只知道基本的Python，因此我无法弄清楚如何从数据中获取整数列表。有人可以解释吗？

解决方案

很明显，自从Greg Reda写这篇文章后，数据结构发生了变化。在浏览数据之前，我建议您通过酸洗将其保存到文件中。这样，你不必一直保持击中NBA服务器，并在每次修改和重新运行脚本时等待下载。

以下脚本检查是否存在腌制的数据，以避免不必要的下载：

导入请求导入json url = 'http://stats.nba.com/stats/leaguedashteamshotlocations?Conference=&DateFr'+ \ 'om =& DateTo =& DistanceRange = By + Zone& Division =& GameScope =& amp ; GameSegment =& LastN'+ \ 'Games = 0& LeagueID = 00& Location =& MeasureType = Opponent& Month = 0& OpponentTeamID'+ \ '= 0& Outcome = & PORound = 0& PaceAdjust = N& PerMode = PerGame& Period = 0& PlayerExperien'+ \ 'ce =& PlayerPosition =&PlusMinus = N& Rank = N& Season = 2014-15& SeasonSegment =& Seas'+ \ 'onType = Regular + Season& ShotClockRange =& StarterBench =& TeamID = 0& V sConference =& VsDivision =' print（url） import sys，os，pickle file_name ='result_sets.pickled' if os .path.isfile（file_name）： result_sets = pickle.load（open（file_name，'rb'）） else： response = requests.get（url） response .raise_for_status（） result_sets = response.json（）['resultSets'] pickle.dump（result_sets，open（file_name，'wb'）） print（result_sets print（result_sets ['rowSet'] [0]） print（len（result_sets ['rowSet' ]））
一旦您有 result_sets 手，你可以检查数据。如果你打印它，你会看到它是一本字典。您可以提取字典键：

print（result_sets.keys（））
目前键是'headers'，'rowSet' code>和'name'。您可以检查标题：
print（result_sets ['headers']）
我对这些统计数据的了解可能比您少。但是，通过查看数据，我可以发现 result_sets ['rowSet'] 包含30行，每行23个元素。 23列由 result_sets ['headers'] [1] 标识。试试这个： $ p $ print（result_sets ['headers'] [1]）$ b $ b
这将显示23列名称。现在看看第一行的团队数据：
print（result_sets ['rowSet'] [0]）
现在您会看到为亚特兰大老鹰队报告的23个值。您可以迭代 result_sets ['rowSet'] 中的行，以提取您感兴趣的任何值并计算汇总信息，例如总计和平均值。
I found Greg Reda's blog post about scraping HTML from nba.com: http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/ I tried to work with the code he wrote there: import requests import json url = 'http://stats.nba.com/stats/leaguedashteamshotlocations?Conference=&DateFr' + \ 'om=&DateTo=&DistanceRange=By+Zone&Division=&GameScope=&GameSegment=&LastN' + \ 'Games=0&LeagueID=00&Location=&MeasureType=Opponent&Month=0&OpponentTeamID' + \ '=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperien' + \ 'ce=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2014-15&SeasonSegment=&Seas' + \ 'onType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=' response = requests.get(url) response.raise_for_status() shots = response.json()['resultSets']['rowSet'] avg_percentage = shots['OPP_FG_PCT'] print(avg_percentage) But it returns: Traceback (most recent call last): File "C:\Python34\nba.py", line 91, in <module> avg_percentage = shots['OPP_FG_PCT'] TypeError: list indices must be integers, not str I know only basic Python so I couldn't figure out how to get a list of integers from the data. Can anybody explain? 解决方案 Evidently the data structure has changed since Greg Reda wrote that post. Before exploring the data, I recommend that you save it to a file via pickling. That way you don't have to keep hitting the NBA server and waiting for a download each time you modify and rerun the script. The following script checks for the existence of the pickled data to avoid unnecessary downloading: import requests import json url = 'http://stats.nba.com/stats/leaguedashteamshotlocations?Conference=&DateFr' + \ 'om=&DateTo=&DistanceRange=By+Zone&Division=&GameScope=&GameSegment=&LastN' + \ 'Games=0&LeagueID=00&Location=&MeasureType=Opponent&Month=0&OpponentTeamID' + \ '=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperien' + \ 'ce=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2014-15&SeasonSegment=&Seas' + \ 'onType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=' print(url) import sys, os, pickle file_name = 'result_sets.pickled' if os.path.isfile(file_name): result_sets = pickle.load(open(file_name, 'rb')) else: response = requests.get(url) response.raise_for_status() result_sets = response.json()['resultSets'] pickle.dump(result_sets, open(file_name, 'wb')) print(result_sets.keys()) print(result_sets['headers'][1]) print(result_sets['rowSet'][0]) print(len(result_sets['rowSet'])) Once you have result_sets in hand, you can examine the data. If you print it, you'll see that it's a dictionary. You can extract the dictionary keys: print(result_sets.keys()) Currently the keys are 'headers', 'rowSet', and 'name'. You can inspect the headers: print(result_sets['headers']) I probably know less about these statistics than you do. However, by looking at the data, I've been able to figure out that result_sets['rowSet'] contains 30 rows of 23 elements each. The 23 columns are identified by result_sets['headers'][1]. Try this: print(result_sets['headers'][1]) That will show you the 23 column names. Now take a look at the first row of team data: print(result_sets['rowSet'][0]) Now you see the 23 values reported for the Atlanta Hawks. You can iterate over the rows in result_sets['rowSet'] to extract whatever values interest you and to compute aggregate information such as totals and averages. 这篇关于如何使用来自NBA.com的数据？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用来自NBA.com的数据？ [英] How to work with data from NBA.com?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何使用来自NBA.com的数据？ [英] How to work with data from NBA.com?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭