Python Pandas DataReader不再适用于yahoo-finance更改的url [英] Python pandas datareader no longer works for yahoo-finance changed url

查看:553
本文介绍了Python Pandas DataReader不再适用于yahoo-finance更改的url的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于雅虎停止了对API的支持,熊猫数据读取器现在失败了

Since yahoo discontinued their API support pandas datareader now fails

import pandas_datareader.data as web
import datetime
start = datetime.datetime(2016, 1, 1)
end = datetime.datetime(2017, 5, 17)
web.DataReader('GOOGL', 'yahoo', start, end)

HTTPError: HTTP Error 401: Unauthorized

是否有任何非官方的图书馆可让我们暂时解决此问题?可能是Quandl上的东西吗?

is there any unofficial library allowing us to temporarily work around the problem? Anything on Quandl maybe?

推荐答案

因此,他们更改了网址,现在使用cookie保护(可能还使用了javascript),因此我使用了dryscrape(它模拟了浏览器)解决了自己的问题 这只是一个仅供参考,因为这肯定会打破其条款和条件...因此使用风险自负吗?我正在寻找另一家EOD价格来源的Quandl.

So they've changed their url and now use cookies protection (and possibly javascript) so I fixed my own problem using dryscrape, which emulates a browser this is just an FYI as this surely now breaks their terms and conditions... so use at your own risk? I'm looking at Quandl for an alternative EOD price source.

我无法使用Cookie浏览CookieJar,因此我最终使用dryscrape来伪造"用户下载内容

I could not get anywhere with cookie browsing a CookieJar so I ended up using dryscrape to "fake" a user download

import dryscrape
from bs4 import BeautifulSoup
import time
import datetime
import re

#we visit the main page to initialise sessions and cookies
session = dryscrape.Session()
session.set_attribute('auto_load_images', False)
session.set_header('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95     Safari/537.36')    

#call this once as it is slow(er) and then you can do multiple download, though there seems to be a limit after which you have to reinitialise...
session.visit("https://finance.yahoo.com/quote/AAPL/history?p=AAPL")
response = session.body()


#get the dowload link
soup = BeautifulSoup(response, 'lxml')
for taga in soup.findAll('a'):
    if taga.has_attr('download'):
        url_download = taga['href']
print(url_download)

#now replace the default end date end start date that yahoo provides
s = "2017-02-18"
period1 = '%.0f' % time.mktime(datetime.datetime.strptime(s, "%Y-%m-%d").timetuple())
e = "2017-05-18"
period2 = '%.0f' % time.mktime(datetime.datetime.strptime(e, "%Y-%m-%d").timetuple())

#now we replace the period download by our dates, please feel free to improve, I suck at regex
m = re.search('period1=(.+?)&', url_download)
if m:
    to_replace = m.group(m.lastindex)
    url_download = url_download.replace(to_replace, period1)        
m = re.search('period2=(.+?)&', url_download)
if m:
    to_replace = m.group(m.lastindex)
    url_download = url_download.replace(to_replace, period2)

#and now viti and get body and you have your csv
session.visit(url_download)
csv_data = session.body()

#and finally if you want to get a dataframe from it
import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd
df = pd.read_csv(StringIO(csv_data), index_col=[0], parse_dates=True)
df

这篇关于Python Pandas DataReader不再适用于yahoo-finance更改的url的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆