屏幕抓取建议:互动图 [英] Screen scraping advice: Interactive graph

查看:99
本文介绍了屏幕抓取建议:互动图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近随后就如何使用Python用BeautifulSoup一些教程,并学会了如何简单地刮去网页上的文字和网址。现在我想从以下链接,

I have recently followed some tutorials on how to use BeautifulSoup with Python and have learnt how to simply scrape text and urls from webpages. I am now trying to scrape data from the following link,

http://www.study.cam.ac.uk/undergraduate /应用/统计/

有位于页面底部的交互式图形发生器,我想凑从它的所有数据,而无需花费大量时间不厌其烦地从生成的所有可能的图形手写下来的价值观。我试图用我的可怜的初学者技巧,但不是很明显我在哪里的图形数据是从哪里来的HTML - 除了在HTML似乎不同的地方我的鼠标在屏幕上是动态的。

There is an interactive graph generator at the bottom of the page and I would like to scrape all the data from it without having to spend many hours tediously handwriting down the values from all the possible graphs generated. I have tried to use my measly beginner techniques but it is not obvious to me where in the HTML the graph data is coming from - in addition the HTML seems to be dynamic depending on where my mouse is on the screen.

问题:是否可以使用这些工具来凑这个数据,如果因此如何

The Question: Is it possible to scrape this data using these tools and if so how?

推荐答案

使用浏览器的开发者工具,你可以看到,当你在显示图形按钮单击有一个 POST 要求去的 http://www.study.cam.ac.uk/undergraduate/apply/statistics/data.php 。其结果是 JSON包含所有构建图形所需要的数据对象。

Using browser developer tools, you can see that when you click on Show Graph button there is a POST request going to http://www.study.cam.ac.uk/undergraduate/apply/statistics/data.php. The result is a JSON object containing all of the data needed to build a graph.

模拟在Python这一要求,例如,用 要求 模块:

Simulate this request in Python, for example, with requests module:

import requests

URL = "http://www.study.cam.ac.uk/undergraduate/apply/statistics/data.php"
HEADERS = {'X-Requested-With': 'XMLHttpRequest'}

data = {
    'when': 'year',
    'year': 2014,
    'applications': 'on',
    'offers': 'on',
    'acceptances': 'on',
    'groupby': 'college',
    'for-5-years-what': 'university'
}

response = requests.post(URL, data=data, headers=HEADERS)
print response.json()

无需 BeautifulSoup 在这里。至少,从我从你的问题的理解。

No need for BeautifulSoup here. At least, from what I've understood from your question.

这篇关于屏幕抓取建议:互动图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆