使用 python 从网站获取非 HTML 数据 [英] Grabbing non-HTML data from a website using python

查看:66
本文介绍了使用 python 从网站获取非 HTML 数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将此页面上的当前合约价格转换为字符串:http://www.cmegroup.com/trading/equity-index/us-index/e-mini-sandp500.html

I'm trying to get the current contract prices on this page to a string: http://www.cmegroup.com/trading/equity-index/us-index/e-mini-sandp500.html

我真的很想要一个 python 2.6 解决方案.

I would really like a python 2.6 solution.

使用 urllib 获取页面 html 很容易,但这个数字似乎是实时的,而不是在 html 中.我检查了 Chrome 中的元素,这是一些 td 类的东西.

It was easy to get the page html using urllib, but it seems like this number is live and not in the html. I inspected the element in Chrome and it's some td class thing.

但我不知道如何用 python 解决这个问题.我尝试了 beautifulsoup(但经过几次尝试后放弃了让 tar.gz 在我的 windows x64 系统上工作),然后是 elementtree,但我的编程兴趣实际上是数据分析.我不是网站设计师,也不想成为一名网站设计师,所以这完全是一门外语.这是实时价格 XML 吗?

But I don't know how to get at this with python. I tried beautifulsoup (but after several attempts gave up getting a tar.gz to work on my windows x64 system), and then elementtree, but really my programming interest is data analysis. I'm not a website designer and don't really want to become one, so it's all kind of a foreign language. Is this live price XML?

感谢任何帮助.理想情况下是一个易于安装的模块和一些实际代码,但非常欢迎所有提示和技巧.

Any assistance gratefully received. Ideally a simple to install module and some actual code, but all hints and tips very welcome.

推荐答案

看起来表格中的数字是由 Javascript 填充的,因此仅使用 urllib 或其他库获取 HTML 是不够的,因为它们没有't 运行 javascript.您需要使用 PyQt 之类的库来模拟浏览器呈现页面/执行 JS 以填充数字,然后抓取其输出 HTML.

It looks like the numbers in the table are filled in by Javascript, so just fetching the HTML with urllib or another library won't be enough since they don't run the javascript. You'll need to use a library like PyQt to simulate the browser rendering the page/executing the JS to fill in the numbers, then scrape the output HTML of that.

请参阅有关使用 PyQt 的博客文章:http://blog.motane.lu/2009/07/07/downloading-a-pages-content-with-python-and-webkit/链接文本

See this blog post on working with PyQt: http://blog.motane.lu/2009/07/07/downloading-a-pages-content-with-python-and-webkit/link text

这篇关于使用 python 从网站获取非 HTML 数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆