用于呈现HTML和JavaScript的Python库 [英] Python library for rendering HTML and javascript

查看:102
本文介绍了用于呈现HTML和JavaScript的Python库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有用JavaScript呈现HTML页面并获取DOM对象的python模块?



我想解析一个生成几乎所有内容的页面使用javascript。

解决方案

这里最大的困难是模拟浏览器之外的完整浏览器环境。你可以使用像Rhino和SpiderMonkey这样的独立javascript解释器来运行javascript代码,但它们不提供完整的浏览器,比如环境来完整呈现网页。



如果我需要解决这样的问题我首先会看看javascript如何渲染页面,它很可能通过AJAX提取数据并使用它来渲染页面。然后,我可以使用像simplejson和httplib2这样的python库直接读取数据并使用它,否定访问DOM对象的需要。然而,这只是一种可能的情况,我不知道你正在解决的具体问题。

其他选项包括Łukasz提到的硒,某种嵌入式webkit疯狂,某种类型的IE win32脚本疯狂,或者最终,基于pyxpcom的解决方案(增加了疯狂)。所有这些都有一个缺点,就是需要几乎完全运行的用于python的网页浏览器,根据您的环境,这可能不是一个选项。


Is there any python module for rendering a HTML page with javascript and get back a DOM object?

I want to parse a page which generates almost all of its content using javascript.

解决方案

The big complication here is emulating the full browser environment outside of a browser. You can use stand alone javascript interpreters like Rhino and SpiderMonkey to run javascript code but they don't provide a complete browser like environment to full render a web page.

If I needed to solve a problem like this I would first look at how the javascript is rendering the page, it's quite possible it's fetching data via AJAX and using that to render the page. I could then use python libraries like simplejson and httplib2 to directly fetch the data and use that, negating the need to access the DOM object. However, that's only one possible situation, I don't know the exact problem you are solving.

Other options include the selenium one mentioned by Łukasz, some kind of webkit embedded craziness, some kind of IE win32 scripting craziness or, finally, a pyxpcom based solution (with added craziness). All these have the drawback of requiring pretty much a fully running web browser for python to play with, which might not be an option depending on your environment.

这篇关于用于呈现HTML和JavaScript的Python库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆