抓取页面以从Google Finance获取价格 [英] page scraping to get prices from google finance

查看:158
本文介绍了抓取页面以从Google Finance获取价格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过抓取Google财务页面来获取股价,我正在python中使用urllib软件包,然后使用正则表达式来获取价格数据.

I am trying to get stock prices by scraping google finance pages, I am doing this in python, using urllib package and then using regex to get price data.

当我让python脚本运行时,它最初工作了一段时间(几分钟),然后开始引发异常[HTTP错误503:服务不可用]

When I leave my python script running, it works initially for some time (few minutes) and then starts throwing exception [HTTP Error 503: Service Unavailable]

我想这是因为在Web服务器端它检测到机器人频繁地更新页面,并在一段时间后引发此异常.

I guess this is happening because on web server side it detects frequent page updates as a robot and throws this exception after a while..

有没有解决的办法,即删除一些cookie或创建一些cookie等.

is there a way around this, i.e. deleting some cookie or creating some cookie etc..

如果Google提供了一些api甚至更好,我想在python中这样做,因为在python中有完整的应用程序,但是如果python中没有可用的应用程序,我可以考虑使用其他方法.这是我在循环中使用的python方法来获取数据(几秒钟的睡眠后我在循环中称为该方法)

or even better if google gives some api, I want to do this in python because the complete app in python, but if there is nothing available in python to do this, I can consider alternatives. This is my python method that I use in loop to get data ( with few seconds of sleep I call this method in loop)

 def getPriceFromGOOGLE(self, symbol):
    """ 
    gets last traded price from google for given security
    """         
    toReturn = 0.0
    try:
        base_url = 'http://google.com/finance?q='
        req = urllib2.Request(base_url + symbol)
        content = urllib2.urlopen(req).read()
        namestr = 'name:\"' + symbol + '\",cp:(.*),p:(.*),cid(.*)}'
        m = re.search(namestr, content)
        if m:
            data = str(m.group(2).strip().strip('"'))
            price = data.replace(',','')
            toReturn = float(price)
        else:
            print 'ERROR ' + str(symbol) + ' --- ' + str(content)      
    except Exception, exc:
        print 'Exc: ' + str(exc)       
    finally: 
        return toReturn

推荐答案

有一个Google Finance API:

There is a Google Finance API:

http://code.google.com/apis/finance/docs/2.0/developers_guide_protocol.html

有一个Python客户端库:

And there is a Python client library for it:

http://code.google.com/p/gdata-python-client/

这篇关于抓取页面以从Google Finance获取价格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆