从网站提取数据 [英] Extracting data from Web

查看:165
本文介绍了从网站提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个真正的新手问题。
我正在为我的家庭使用的小python脚本,将收集特定机票的数据。

One really newbie question. I'm working on a small python script for my home use, that will collect data of a specific air ticket.

我想提取(使用BeautifulSoap和的urllib)来自Skyscanner上的数据。例如:

I want to extract the data from skyscanner (using BeautifulSoap and urllib). Example:

<一个href=\"http://www.skyscanner.net/flights/lond/rome/120922/120929/airfares-from-london-to-rome-in-september-2012.html\" rel=\"nofollow\">http://www.skyscanner.net/flights/lond/rome/120922/120929/airfares-from-london-to-rome-in-september-2012.html

和我感兴趣的所有存储在这样的元素,特别是价格数据:的 http://shrani.si/f/1w/An/1caIzEzT/capture.png

And I'm interested in all the data that are stored in this kind of element, specially the price: http://shrani.si/f/1w/An/1caIzEzT/capture.png

由于它们不位于HTML,我可以提取它们?

Because they are not located in the HTML, can I extract them?

推荐答案

我相信问题是,这些值是通过你的浏览器中运行,一个javascript code呈现的urllib 不 - 你应该使用可以执行JavaScript的code库

I believe the problem is that these values are rendered through a javascript code which your browser runs and urllib doesn't - You should use a library that can execute javascript code.

我只是用Google搜索履带蟒蛇的JavaScript ,我得到了一些计算器问题和答案哪些建议使用的的WebKit 。您可以通过 scrapy 使用这些库。这里有两个片段:

I just googled crawler python javascript and I got the some stackoverflow questions and answers which recommends the use of selenium or webkit. You can use those libraries through scrapy. Here are two snippets:

渲染/互动JavaScript和GTK / WebKit的/ jswebkit

渲染的Javascript履带随着Scrapy和硒RC

这篇关于从网站提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆