我缺少哪些标题来抓取 NBA 统计数据? [英] What headers am I missing to scrape the NBA Stats data?

查看:35
本文介绍了我缺少哪些标题来抓取 NBA 统计数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

几天前,我在 Power BI 中创建了一个 Web 查询,允许我从 NBA 球员统计数据 不使用任何标题.截至今天,我注意到该查询不再有效;我收到以下错误消息:

A couple of days ago in Power BI, I was able to create a web query that allowed me to extract the JSON data from NBA Player Stats without using any headers. As of today, I have noticed that the query no longer works; I am getting the following error message:

DataSource.Error: The underlying connection was closed. An unexpected error occurred on a receive.
Details: https://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=&Weight=

在相关说明中,我曾经能够从 NBA Team Stats 使用 https://stats.nba.com/ 作为 Referer 标题,但现在它给了我同样的错误消息如上图.为了尝试解决这些错误,我尝试输入以下标题:

On a related note, I used to be able to pull the JSON data from NBA Team Stats using https://stats.nba.com/ as a Referer header, but now it's giving me the same error message as shown above. To try and get around these errors, I have tried entering the following headers:

Host: stats.nba.com
Connection: keep-alive
Accept: application/json
x-nba-stats-token: true
User-Agent: Chrome/79.0.3945.130
x-nba-stats-origin: stats
Referer: https://stats.nba.com/
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9

当我使用上述标题提交查询时,它会返回以下错误消息:

When I do submit the query with the above headers, it comes back with the following error message:

Unable to connect

We encountered an error while trying to connect.

Details: "The 'Host' header must be modified using the appropriate property or method.
Parameter name: name"

对于如何正确运行查询,我已经没有想法了.我对网络抓取和 HTML 真的很陌生——我一直在努力自学.非常感谢任何帮助.

I have run out of ideas as to how I'm able to properly run the query. I'm really new to web-scraping and HTML -- I've been trying to teach myself. Any help is greatly appreciated.

推荐答案

GET 请求的所有标头:

All headers for GET request:

Host: stats.nba.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: application/json, text/plain, */*
x-nba-stats-token: true
X-NewRelic-ID: VQECWF5UChAHUlNTBwgBVw==
DNT: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36
x-nba-stats-origin: stats
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Referer: https://stats.nba.com/teams/traditional/?sort=TEAM_NAME&dir=-1
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US;q=0.9,en;q=0.7

网址:

https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=

必填标题:

Accept: application/json, text/plain, */*
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36
x-nba-stats-origin: stats
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Referer: https://stats.nba.com/teams/traditional/?sort=TEAM_NAME&dir=-1

不确定是否需要:

x-nba-stats-token: true
X-NewRelic-ID: VQECWF5UChAHUlNTBwgBVw==

可能的问题:

  1. 您检测为机器人并被阻止

  1. You detected as a bot and blocked

Header X-NewRelic-ID 是一个令牌(可能有超时).可能它是使用不同的参数分配的,例如 IP、User-Agent 等等.
您可以通过对 https://stats.nba.com/ 的 GET 请求在 HTML 响应中获取新的 X-NewRelic-ID.这是带有 xpid 标记的 HTML 的一部分:<script type="text/javascript">(window.NREUM||(NREUM={})).loader_config={xpid:"VQECWF5UChAHUlNTBwgBVw==",licenseKey:"09f0cb5c68",applicationID:"76210961"};

Header X-NewRelic-ID is a token (maybe with timeout). Probably it's assign using different params like IP, User-Agent and among others.
You can get fresh X-NewRelic-ID in HTML response with GET request to https://stats.nba.com/. Here is a part from HTML with xpid token: <script type="text/javascript">(window.NREUM||(NREUM={})).loader_config={xpid:"VQECWF5UChAHUlNTBwgBVw==",licenseKey:"09f0cb5c68",applicationID:"76210961"};

这篇关于我缺少哪些标题来抓取 NBA 统计数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆