如何使机械化等待网页“满载"? [英] How to make mechanize wait for web-page 'full' load?

查看:61
本文介绍了如何使机械化等待网页“满载"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想抓取一些网页来动态加载其组件. 该页面具有onload脚本,在将URL输入浏览器后的3-5秒内,我可以看到完整的页面.

I want to scrape some web page which loads its components dynamically. This page has an onload script, and I can see the complete page 3-5 seconds after typing the URL into my browser.

问题是,当我呼叫br.open('URL')时,响应是在0秒的网页. HTML(我想要的)和br.open('URL')的结果之间有3-5秒的差异.

The problem is, when I call br.open('URL'), the response is the web page at 0 seconds. There is a difference 3-5 seconds later between the HTML (which I want) and result of br.open('URL').

推荐答案

使用机械化处理具有丰富javascript内容的网页并不容易,但是有多种方法可以根据不同情况获得所需的内容.

Working a webpage with a rich javascripts content with mechanize is not much easy, but there are ways to get what you want according to different situations.

  • 如果提出了一些创建内容的json请求,则可以调用该url并尝试解析响应以获取内容,然后尝试将其正确加入.

  • If some json requests are made to create the content, then you can call that urls and try to parse responses to get content, then try to join it properly.

如果需要使用某些表单,则可以创建一些表单字段并在机械化中设置它们的值.或者,只需编写一种方法即可对您的POSTGET数据(带引号的特殊字符等)进行编码,并使用mechanize.browser.open方法发送它们.

If you need to use some forms, you can create some form fields and set their values within mechanize. Or , simply write a method that will encode your POST or GET data (quote special characters etc..) and send them with mechanize.browser.open method.

如果页面具有一些基于javascript的安全功能(例如在发布数据之前对表单数据进行某种特殊编码),则可以使用 node.js (例如javascript应用程序服务器)来处理一些javascript代码块.

If page has some javascript based security functions (like some special encoding to form data before posting them), then you may use node.js like javascript application servers to process some javascript code blocks.

但是实际上,上面的某些选项并不容易实现,在对此类项目使用机械化之前,您必须三思.

But in fact, some of the above options are not easy to do, and you must think twice before using mechanize for such projects.

这篇关于如何使机械化等待网页“满载"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆