如何在不指定 pandas 开始日期的情况下从雅虎获取所有历史数据? [英] How to get ALL historical data from Yahoo without specifying a start date in pandas?

查看:57
本文介绍了如何在不指定 pandas 开始日期的情况下从雅虎获取所有历史数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习 python + pandas 进行数据分析.我尝试将一些投资理念编程为练习.pandas 有一个很好的 io.data 模块来从在线资源中提取数据,比如 Yahoo 和 Google.但是,它们都需要一个开始日期,默认情况下为2010.01.01",如以下 data.py 中的代码所指定

I am learning python + pandas for data analysis. I try to program some investment ideas as exercises. pandas has this nice io.data module to pull data from online sources, such as Yahoo and Google. However, they all require a start date, which by default is "2010.01.01", as specified in the following code in data.py

http://github.com/pydata/pandas/blob/master/pandas/io/data.py:

def _sanitize_dates(start, end):
    from pandas.core.datetools import to_datetime
    start = to_datetime(start)
    end = to_datetime(end)
    if start is None:
        start = dt.datetime(2010, 1, 1)
    if end is None:
        end = dt.datetime.today()
    return start, end

由于每只股票在历史上的不同日期首次公开募股,因此很难为每个股票代码指定.如果有一个选项可以设置熊猫读取所有数据,那不是很好吗?即使对于一家拥有 50 年历史的上市公司,数据也只有 ~50*200 = 10,000 行.Python 应该没问题,对吧?

Since every stock IPOed at different dates in history, it will be very hard to specify for each ticker. Wouldn't it be nice if there is an option to set pandas to read ALL data? Even for a 50 year old public company, the data is only ~50*200 = 10,000 rows. Python should be OK to handle that, right?

感谢您的帮助.并向 Wes 和其他熊猫贡献者致敬;熊猫很棒!

Thank you for your help. And my salute to Wes and other pandas contributors; pandas is great!

推荐答案

一个简单的解决方案是假设某个共同的开始日期(在该日期之前信息将不存在).1970 年 1 月 1 日似乎是一个公平的选择.

A simple solution would be to assume some common start date (before which information would not exist). 1 January 1970 seems like a fair choice.

In [55]: from pandas.io.data import DataReader
In [56]: from datetime import datetime
In [57]: df_1=DataReader("AAPL",  "yahoo", datetime(1970,1,1), datetime(2013,10,1))
In [58]: df_1
Out[58]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7330 entries, 1984-09-07 00:00:00 to 2013-10-01 00:00:00
Data columns (total 6 columns):
Open         7330  non-null values
High         7330  non-null values
Low          7330  non-null values
Close        7330  non-null values
Volume       7330  non-null values
Adj Close    7330  non-null values
dtypes: float64(5), int64(1)

现在,我们将选择开始日期为 1984-09-07 并观察我们提取相同的数据,从而以相同的 DataFrame 结束.

Now, we shall choose the starting date as 1984-09-07 and observe that we pull the same data, thereby, ending with the same DataFrame.

In [59]: df_2 = DataReader("AAPL",  "yahoo", datetime(1984,9,7), datetime(2013,10,1))
In [60]: df_2
Out [60]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7330 entries, 1984-09-07 00:00:00 to 2013-10-01 00:00:00
Data columns (total 6 columns):
Open         7330  non-null values
High         7330  non-null values
Low          7330  non-null values
Close        7330  non-null values
Volume       7330  non-null values
Adj Close    7330  non-null values
dtypes: float64(5), int64(1)

这篇关于如何在不指定 pandas 开始日期的情况下从雅虎获取所有历史数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆