解析包含动态javascript对象的网页 [英] Parsing web page containing dynamic javascript objects

查看:93
本文介绍了解析包含动态javascript对象的网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前,我正在使用python及其urllib2(urllib)来检索简单的静态网页.除非网页开发人员添加了Java脚本,否则一切都很顺利.现在,最有趣的信息隐藏在脚本的后面:

Currently I'm using python and its urllib2, urllib to retrieve a simple static web page. Everything was smooth until web-page developers added java scripts. Now the most interesting information is hidden behind the scripts:

<a href="javascript://" class="event-more-view" id="view-moreid-12311" onclick="Markets.applyView(this);return false;" treeid="1291266" eventstate ="false" > add table </a>

浏览器会预加载数据,并在单击"a href"链接时显示数据. 我简短研究的结果是JSOUP和HTMLunit.我在朝正确的方向挖掘吗?有什么利弊吗?

Browser preloads data and shows it when the "a href" link is clicked. The results of my short research are JSOUP and HTMLunit. Am I digging in a right direction? Any cons and pros?

python会帮忙吗?我应该使用Java吗?哪些软件包可以帮助处理动态内容?什么更简单?

Will python help? Should I be using Java? What packages can help with dynamic content? What is simpler?

就我而言,我必须创建某种虚拟浏览器,因为内置脚本会随着时间的推移刷新需要处理的数据.

In my case I have to create some sort of a virtual browser as far as built-in scripts refresh data over time which has to be processed.

推荐答案

您正在朝正确的方向挖掘.

You are digging in a right direction.

以下是一些要考虑的选项/工具:

Here are some options/tools to consider:

  • ghost.py
  • htmlunit under jython
  • selenium

另请参阅:

  • Click on a javascript link within python?
  • Simulating clicking on a javascript link in python

希望有帮助.

这篇关于解析包含动态javascript对象的网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆