屏幕抓取建议:互动图 [英] Screen scraping advice: Interactive graph
问题描述
我最近随后就如何使用Python用BeautifulSoup一些教程,并学会了如何简单地刮去网页上的文字和网址。现在我想从以下链接,
I have recently followed some tutorials on how to use BeautifulSoup with Python and have learnt how to simply scrape text and urls from webpages. I am now trying to scrape data from the following link,
http://www.study.cam.ac.uk/undergraduate /应用/统计/
有位于页面底部的交互式图形发生器,我想凑从它的所有数据,而无需花费大量时间不厌其烦地从生成的所有可能的图形手写下来的价值观。我试图用我的可怜的初学者技巧,但不是很明显我在哪里的图形数据是从哪里来的HTML - 除了在HTML似乎不同的地方我的鼠标在屏幕上是动态的。
There is an interactive graph generator at the bottom of the page and I would like to scrape all the data from it without having to spend many hours tediously handwriting down the values from all the possible graphs generated. I have tried to use my measly beginner techniques but it is not obvious to me where in the HTML the graph data is coming from - in addition the HTML seems to be dynamic depending on where my mouse is on the screen.
问题:是否可以使用这些工具来凑这个数据,如果因此如何
The Question: Is it possible to scrape this data using these tools and if so how?
推荐答案
使用浏览器的开发者工具,你可以看到,当你在显示图形
按钮单击有一个 POST
要求去的 http://www.study.cam.ac.uk/undergraduate/apply/statistics/data.php 。其结果是 JSON包含所有构建图形所需要的数据
对象。
Using browser developer tools, you can see that when you click on Show Graph
button there is a POST
request going to http://www.study.cam.ac.uk/undergraduate/apply/statistics/data.php. The result is a JSON
object containing all of the data needed to build a graph.
模拟在Python这一要求,例如,用 要求
模块:
Simulate this request in Python, for example, with requests
module:
import requests
URL = "http://www.study.cam.ac.uk/undergraduate/apply/statistics/data.php"
HEADERS = {'X-Requested-With': 'XMLHttpRequest'}
data = {
'when': 'year',
'year': 2014,
'applications': 'on',
'offers': 'on',
'acceptances': 'on',
'groupby': 'college',
'for-5-years-what': 'university'
}
response = requests.post(URL, data=data, headers=HEADERS)
print response.json()
无需 BeautifulSoup
在这里。至少,从我从你的问题的理解。
No need for BeautifulSoup
here. At least, from what I've understood from your question.
这篇关于屏幕抓取建议:互动图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!