XMLHttpRequest 模仿脚本适用于一个网页,但不适用于另一个网页 [英] XMLHttpRequest mimicking script works on one web page, but not another

查看:42
本文介绍了XMLHttpRequest 模仿脚本适用于一个网页,但不适用于另一个网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Windows 8 64 位上使用 Python.org 2.7 64 位版本.我有一些代码可以遍历一系列 XHR 请求,以从网站上提取足总杯数据.'year_tournament_map'字典中的每个值代表每个赛季足总杯的ID码,依次解析.

I am using Python.org version 2.7 64 bit on Windows 8 64 bit. I have some code that iterates through a series of XHR requests to pull down FA Cup data from a website. Each value in the dictionary 'year_tournament_map' represents the ID code for each season's FA Cup, which are parsed in turn.

代码如下:

import json
import requests
import time

from datetime import date, timedelta

year_tournament_map = {
    2013: 8273,
    2012: 6978,
    2011: 5861,
    2010: 4940,
    2009: 3419,
    2008: 2689,
    2007: 2175,
    2006: 1645,
    2005: 1291,
    2004: 903,
    2003: 579,
    2002: 421,
    2001: 243,
    2000: 114,
    1999: 26,
}

years = sorted(year_tournament_map.keys())
url = 'http://www.whoscored.com/tournamentsfeed/%s/Fixtures/'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}

for year in years:
    start_date = date(year, 11, 1)
    end_date = date(year + 1, 5, 31)
    delta = end_date - start_date

    for days  in range(delta.days + 1):
        time.sleep(0.5) 

        test_date = start_date + timedelta(days=days)

        params = {'d': str(test_date).replace('-', ''), 'isAggregate': 'false'}
        response = requests.get(url % year_tournament_map[year], params=params, headers=headers)

        try:
            json_data = response.content.replace("'", '"').replace(',,', ',null,')
            fixtures = json.loads(json_data)

        except ValueError:
            print "Error!!!"

        else:

            if fixtures:  # If there are fixtures
                print ",\n".join([", ".join(str(x) for x in fixture) for fixture in fixtures])  # `fixtures` is a nested list

            else:
               print "No Fixtures Today: %s" %  test_date

这很有效,所以我决定在其他比赛中尝试使用这种方法,例如英超联赛.我用下面的字典替换了上面的字典,其中包含英超联赛的 ID 代码而不是足总杯:

This works great, so I decided to experiment using this method on other tournaments, for example the English Premier League. I replaced the above dictionary with the following one, which has the ID codes for the Premier League instead of the FA Cup:

year_tournament_map = {
 1999: 2,
 2000: 85,
 2001: 191,
 2002: 299,
 2003: 429,
 2004: 594,
 2005: 836,
 2006: 667,
 2007: 1256,
 2008: 1539,
 2009: 1849,
 2010: 2458,
 2011: 2935,
 2012: 3389,
 2013: 3853,
 2014: 4311, }

然而,当运行时,这并没有像预期的那样工作.第二个赛季产生国际赛事,第四个赛季来自芬兰联赛或杯赛.然后它出现错误提示我正在尝试打印以筛选无 ASCII/Unicode 字符.

When run however, this does not work as anticipated. The second season produces international fixtures and the fourth some fixtures from either the Finnish league or cup. It then falls over with an error saying I am trying to print to screen none ASCII/Unicode characters.

我被告知项目url = 'http://www.whoscored.com/tournamentsfeed/%s/Fixtures/'" 可以使用我的浏览器开发工具观察到,但我无法找到它.

I was advised that the item "url = 'http://www.whoscored.com/tournamentsfeed/%s/Fixtures/'" can be observed using my browser development tools, but I was unable to locate it.

我想知道的是:

1) 我是否使用了英超联赛数据中 XHR 的正确 URL2)在源代码中哪里可以找到对上述网址的引用3) 为什么我的代码返回与我正在浏览的页面上的内容无关/不正确的数据 这里.

1) Am I using the correct URL for the XHR on the Premier League data 2) Where in the sourcve code reference to the above URL can be found 3) Why my code is returning irrelevant/incorrect data in relation to what is on the page that I am browsing here.

谢谢

推荐答案

当我更多地掌握 Google Chrome 开发工具后,这个问题最终得到了解决.在开发工具的控制台"选项卡上,如果您右键单击,您将获得一个选项来打开 XMLHTTPRequests.完成此操作后,当您更改日历上的日期时,您会看到 XHR 提交到tournamentsdfeed".出于某种原因,英超联赛数据中的数字赛季标识符与页面顶部地址栏中显示的数字不同.对于足总杯数据,情况并非如此.

This was resolved in the end once I got to grips with the Google Chrome Development Tools a little more. On the 'Console' tab of the Dev tools, if you right click you get an option to turn on XMLHTTPRequests. Once this is done, when you change the dates on the calendar you see the XHR submission to 'tournamentsdfeed'. For some reason on the Premier League data the numeric season identifiers are different than those displayed in the address bar at the top of the page. With the FA Cup data this is not the case.

用于此的字典应为:

year_tournament_map = {
 1999: 2,
 2000: 89,
 2001: 213,
 2002: 359,
 2003: 542,
 2004: 803,
 2005: 1208,
 2006: 937,
 2007: 2025,
 2008: 2539,
 2009: 3115,
 2010: 4345,
 2011: 5476,
 2012: 6531,
 2013: 7794,
 2014: 9155, }

谢谢

这篇关于XMLHttpRequest 模仿脚本适用于一个网页,但不适用于另一个网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆