用javascript屏幕延迟刮刮网站 [英] Scraping sites with javascript screen delay

查看:130
本文介绍了用javascript屏幕延迟刮刮网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试抓取一个具有分秒javascript延迟的网站。

I'm attempting to scrape a site that has a split second javascript delay.

我目前正在使用python进行抓取。每当我'得到'页面时,javascript延迟还没有完成,并且尚未完全加载新的dom。

I'm currently using python for scraping. Whenever I 'get' the page, the javascript delay has not finished and is has not completely loaded the new dom yet.

我如何刮掉这样的pge?

How would I scrape such a pge?

推荐答案

您可以扩展Mozilla以构建可以利用Web浏览器的全部功能的Web scraper。在加载所有数据并构建DOM之后,您可以使用XSLT从DOM中提取所需的数据。如果DOM在初始加载后动态更改,您可以采取一些方法来等待更改。有关详细信息,请访问 http://www.gooseeker.com 。 GooSeeker为所有人免费发布一个类似的工具。大多数代码都是javascript和可读的,你可以从中找到它的运行方式。

You can extend Mozilla to build a web scraper which can leverage the full power of the web browser. After all data have been loaded and the DOM has been built, you can extract needed data from the DOM using XSLT. If the DOM was dynamically changed after initial loading, you can take some approaches to wait for the changes. Visit http://www.gooseeker.com for more information. GooSeeker publish a similiar tool free for everyone. Most of codes are in javascript and readible, from which you can find how it runs.

这篇关于用javascript屏幕延迟刮刮网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆