刮一个动态网站 [英] Scrape a dynamic website
问题描述
什么是刮,其中大部分内容是由什么似乎是Ajax请求生成一个动态网站的最佳方法是什么?我有一个机械化,BeautifulSoup和Python的组合previous的经验,但我弥补一些新的东西。
What is the best method to scrape a dynamic website where most of the content is generated by what appears to be ajax requests? I have previous experience with a Mechanize, BeautifulSoup, and python combo, but I am up for something new.
- Edit-- 欲了解更多详细信息:我想刮CNN 主数据库。有丰富的信息有,但似乎没有成为一个API。
--Edit-- For more detail: I'm trying to scrape the CNN primary database. There is a wealth of information there, but there doesn't appear to be an api.
推荐答案
这是我找到的最好的解决办法是使用Firebug监控XmlHtt prequests,然后使用脚本来重新发送它们。
The best solution that I found was to use Firebug to monitor XmlHttpRequests, and then to use a script to resend them.
这篇关于刮一个动态网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!