尝试解析其内容时,XHR 请求 URL 表示不存在 [英] XHR request URL says does not exist when attempting to parse it's content

查看:24
本文介绍了尝试解析其内容时,XHR 请求 URL 表示不存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我使用 Scrapy 为我的问题构建完整的解决方案之前,我发布了一个我想要做的简单版本:

导入请求url = 'http://www.whoscored.com/stageplayerstatfeed/?field=1&isAscending=false&orderBy=Rating&playerId=-1&stageId=9155&teamId=32"'params = {'d': date.strftime('%Y%m'), 'isAggregate': 'false'}headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}response = requests.get(url, params=params, headers=headers)固定装置 = response.body#fixtures =literal_eval(response.content)印刷夹具

这段代码是说上面的网址不存在.URL 与 XHR 请求相关,当您从该页面上的主表的总体"切换到主页"选项卡时提交该请求:

http://www.whoscored.com/Teams/32/

如果您在 Google Developer Tools 的控制台中激活 XHR 日志记录,您可以看到 XHR 请求和服务器以字典形式(这是预期格式)发送的响应.

谁能告诉我为什么上面的代码没有返回我希望看到的数据?

谢谢

解决方案

你有几个问题:

  • 网址应为 http://www.whoscored.com/stageplayerstatfeed
  • 错误的GET参数
  • 缺少重要的必需标头
  • 你需要response.json(),而不是response.body

固定版本:

导入请求url = 'http://www.whoscored.com/stageplayerstatfeed'参数 = {'字段':'1','isAscending': '假','orderBy': '评级','玩家ID':'-1','stageId': '9155','团队ID':'32'}headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36','X-Requested-With': 'XMLHttpRequest','Host': 'www.whoscored.com','推荐人':'http://www.whoscored.com/Teams/32/'}response = requests.get(url, params=params, headers=headers)夹具 = response.json()印刷夹具

打印:

<预><代码>[{u'AccurateCrosses': 0,u'AccurateLongBalls': 10,u'AccuratePasses': 89,u'AccurateThroughBalls': 0,u'AerialLost': 2,u'AerialWon': 4,...},...]

Before I build a full solution to my problem using Scrapy I am posting a simplistic version of what I want to do:

import requests

url = 'http://www.whoscored.com/stageplayerstatfeed/?field=1&isAscending=false&orderBy=Rating&playerId=-1&stageId=9155&teamId=32"'

params = {'d': date.strftime('%Y%m'), 'isAggregate': 'false'}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}

response = requests.get(url, params=params, headers=headers)

fixtures = response.body
#fixtures = literal_eval(response.content)
print fixtures 

This code is saying that the above URL does not exist. The URL relates to an XHR request that is submitted when you toggle from the 'Overall' to the 'Home' tab of the main table on this page:

http://www.whoscored.com/Teams/32/

If you activate XHR logging within the Console of Google Developer Tools you can see both the XHR request and the response sent from the server in the form of a dictionary (which is the expected format).

Can anyone tell me why the above code is not returning the data I would expect to see?

Thanks

解决方案

You have several problems:

  • the url should be http://www.whoscored.com/stageplayerstatfeed
  • wrong GET parameters
  • missing important required headers
  • you need response.json(), not response.body

The fixed version:

import requests

url = 'http://www.whoscored.com/stageplayerstatfeed'
params = {
    'field': '1',
    'isAscending': 'false',
    'orderBy': 'Rating',
    'playerId': '-1',
    'stageId': '9155',
    'teamId': '32'
}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36',
           'X-Requested-With': 'XMLHttpRequest',
           'Host': 'www.whoscored.com',
           'Referer': 'http://www.whoscored.com/Teams/32/'}

response = requests.get(url, params=params, headers=headers)

fixtures = response.json()
print fixtures

Prints:

[
    {
        u'AccurateCrosses': 0,
        u'AccurateLongBalls': 10,
        u'AccuratePasses': 89,
        u'AccurateThroughBalls': 0,
        u'AerialLost': 2,
        u'AerialWon': 4,
        ...
    },
    ...
]

这篇关于尝试解析其内容时,XHR 请求 URL 表示不存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆