解析包含动态javascript对象的网页 [英] Parsing web page containing dynamic javascript objects
问题描述
当前,我正在使用python及其urllib2(urllib)来检索简单的静态网页.除非网页开发人员添加了Java脚本,否则一切都很顺利.现在,最有趣的信息隐藏在脚本的后面:
Currently I'm using python and its urllib2, urllib to retrieve a simple static web page. Everything was smooth until web-page developers added java scripts. Now the most interesting information is hidden behind the scripts:
<a href="javascript://" class="event-more-view" id="view-moreid-12311" onclick="Markets.applyView(this);return false;" treeid="1291266" eventstate ="false" > add table </a>
浏览器会预加载数据,并在单击"a href"链接时显示数据. 我简短研究的结果是JSOUP和HTMLunit.我在朝正确的方向挖掘吗?有什么利弊吗?
Browser preloads data and shows it when the "a href" link is clicked. The results of my short research are JSOUP and HTMLunit. Am I digging in a right direction? Any cons and pros?
python会帮忙吗?我应该使用Java吗?哪些软件包可以帮助处理动态内容?什么更简单?
Will python help? Should I be using Java? What packages can help with dynamic content? What is simpler?
就我而言,我必须创建某种虚拟浏览器,因为内置脚本会随着时间的推移刷新需要处理的数据.
In my case I have to create some sort of a virtual browser as far as built-in scripts refresh data over time which has to be processed.
推荐答案
您正在朝正确的方向挖掘.
You are digging in a right direction.
以下是一些要考虑的选项/工具:
Here are some options/tools to consider:
- ghost.py
- htmlunit under jython
- selenium
另请参阅:
- Click on a javascript link within python?
- Simulating clicking on a javascript link in python
希望有帮助.
这篇关于解析包含动态javascript对象的网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!