用刮使用Python AJAX网页 [英] Scrape a webpage with AJAX using Python

查看:114
本文介绍了用刮使用Python AJAX网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道与Python的美丽汤刮HTML的基础知识。然而,这个足球统计页面使得AJAX调用获取数据由玩家扮演分钟。 (我确定使用萤火虫网络电话)。

I know about the basics of scraping HTML with Python's Beautiful Soup. However, this soccer statistics page makes a AJAX call to get data on minutes played by a player. (I identified the network call using firebug).

我的问题:是它甚至有可能使用Python来凑这个信息?我需要什么样的工具和超越HTML我应该知道吗? (我目前正在读了JavaScript和AJAX)。

My question: is it even possible to use python to "scrape" this information? What tools would I need and what beyond HTML should I know? (I'm currently reading up on JavaScript and AJAX).

我这个非具体问题道歉,但我不知道怎么google一下,可能会或可能不存在的工具。

I apologize for this non-specific question, but I don't even know how to Google about tools that may or may not exist.

更新:几天后,我想出了使用在Python与 PhantomJS 解决方案结合。我基本上是用到每个环节,等待页面加载,然后刮掉的信息。 PhantomJS 作为在

UPDATE: After a few days I came up with a solution using Selenium in Python in conjunction with PhantomJS. I basically used Selenium to go to each link, waited for the page to load, then scraped the information. PhantomJS serves as the headless webdriver in Selenium.

我明白了为什么MOD​​S要关闭这一点,但咨询的人给了我在这里是非常有帮助的,因为他们推出我到正确的方向。我的问题是不是太多关于什么工具是最好的两种,但更多有关如何,我可以在Python做到这一点。

I understand why mods want to close this, but the advice people gave me here was extremely helpful since they launched me into the right direction. My question wasn't too much about what tool is best either, but more about how I can do this in Python.

推荐答案

使用python是不必要的,也不会在很多情况下工作,最好的办法就是运行一个合适的浏览器,并使用JavaScript来完成所有的拼抢,因为这将有访问整个DOM,你甚至可以绑定到事件。

Using python is unnecessary and will not work in many cases, best way is to run a proper browser and use javascript to do all the scraping, as it will have access to whole DOM, and you can even bind to events.

有很多很好的无头的浏览器支持脚本,我最喜欢的是 PhantomJS ,你可以用它来加载网页和刮或将它们保存为图像,例如

There are many good headless browsers with scripting support, my favourite is PhantomJS, you can use it to load webpages and scrape them or save them as image e.g.

var page = require('webpage').create();
page.open('http://github.com/', function () {
    page.render('github.png');
    phantom.exit();
});

但后来有刮框架建立了PhantomJS如 pjscrape

这篇关于用刮使用Python AJAX网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆