如何使用jsoup刮取ajax加载的内容 [英] How to scrape ajax loaded content with jsoup
问题描述
我已经使用JSOUP进行抓取并且它的工作完美,直到ajax和javascript没有播放他们的角色来显示网页内容。
I have used JSOUP for scraping and its works perfectly till the ajax and javascript not playing their roles to display webpage content .
现在大家都有任何线索,如何刮那些在页面完全加载后用ajax或JavaScript显示的内容。
Now guys any clue , how to scrape those content which get displayed with ajax or by JavaScript after page get loads completely .
提前致谢!!
推荐答案
您可以使用无头浏览器作为 PhatomJS 。
You can use a headless browser as PhatomJS.
PhantomJS是一个带有JavaScript API的无头WebKit脚本。它具有对各种Web标准的快速和原生支持:DOM处理,CSS选择器,JSON,Canvas和SVG。
PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.
为了简化您的工作,您可以使用 CapserJS
In order to ease your work, You could use CapserJS
CasperJS是PhatomJS的配套产品,它带来了大大改进的API,可以轻松创建抓取和自动化工作流程。
CasperJS is a companion for PhatomJS which brings a greatly improved API to ease the creation of scraping and automation workflows.
当您必须使用动态内容抓取网站时,这些工具非常有用,例如,在Javascript中运行进程后显示内容的网站(有时包括ajax调用)。
These tools are very useful when you have to scrape a websites with dynamic content, for instance, websites where the content is displayed after it ran process in Javascript (sometimes including ajax calls).
你可以看到一个关于casper如何工作的例子:
带链式选择的CasperJs和Jquery
You can see a example about how casper works here:
CasperJs and Jquery with chained Selects
这篇关于如何使用jsoup刮取ajax加载的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!