Python-Pandas Dataframe-数据不匹配的源 [英] Python - Pandas Dataframe - data not matching source

查看:287
本文介绍了Python-Pandas Dataframe-数据不匹配的源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用来自Yahoo的每月库存数据来分析模式.出于某种原因,该程序在数据框中吐出的特定股票(ATVI)的月度回报与实际Yahoo网站的回报不匹配.我比较了2015年期间的月度回报,并在列中列出了平均涨跌幅以及每种情况出现的次数.

I'm trying to use monthly stock data from yahoo to analyze patterns. For some reason, the monthly returns the program is spitting out in a dataframe for a particular stock (ATVI) do not match the returns from the actual yahoo site. I compared monthly returns for the 2015 period and included columns for average increases and decreases as well as the # of occurrences of each.

Yahoo链接:我的代码:

from datetime import datetime
from pandas_datareader import data, wb
import pandas_datareader.data as web
import pandas as pd
from pandas_datareader._utils import RemoteDataError
import csv
import sys
import os
import time

class MonthlyChange(object):
    months = { 0:'JAN', 1:'FEB', 2:'MAR', 3:'APR', 4:'MAY',5:'JUN', 6:'JUL', 7:'AUG', 8:'SEP', 9:'OCT',10:'NOV', 11:'DEC' }

def __init__(self,month):
    self.month = MonthlyChange.months[month-1]
    self.sum_of_pos_changes=0
    self.sum_of_neg_changes=0
    self.total_neg=0
    self.total_pos=0
def add_change(self,change):
    if change < 0:
        self.sum_of_neg_changes+=change
        self.total_neg+=1
    elif change > 0:
        self.sum_of_pos_changes+=change
        self.total_pos+=1
def get_data(self):
    if self.total_pos == 0:
        return (self.month,0.0,0,self.sum_of_neg_changes/self.total_neg,self.total_neg)
    elif self.total_neg == 0:
        return (self.month,self.sum_of_pos_changes/self.total_pos,self.total_pos,0.0,0)
    else:
        return (self.month,self.sum_of_pos_changes/self.total_pos,self.total_pos,self.sum_of_neg_changes/self.total_neg,self.total_neg)


for ticker in ['ATVI']: 

try:

    data = web.DataReader(ticker.strip('\n'), "yahoo", datetime(2015,01,1), datetime(2015,12,31))
    data['ymd'] = data.index
    year_month = data.index.to_period('M')
    data['year_month'] = year_month
    first_day_of_months = data.groupby(["year_month"])["ymd"].min()
    first_day_of_months = first_day_of_months.to_frame().reset_index(level=0)
    last_day_of_months = data.groupby(["year_month"])["ymd"].max()
    last_day_of_months = last_day_of_months.to_frame().reset_index(level=0)
    fday_open = data.merge(first_day_of_months,on=['ymd'])
    fday_open = fday_open[['year_month_x','Open']]
    lday_open = data.merge(last_day_of_months,on=['ymd'])
    lday_open = lday_open[['year_month_x','Open']]

    fday_lday = fday_open.merge(lday_open,on=['year_month_x'])
    monthly_changes = {i:MonthlyChange(i) for i in range(1,13)}
    for index,ym, openf,openl in fday_lday.itertuples():
        month = ym.strftime('%m')
        month = int(month)
        diff = (openf-openl)/openf
        monthly_changes[month].add_change(diff)

    changes_df = pd.DataFrame([monthly_changes[i].get_data() for i in monthly_changes],columns=["Month","Avg Inc.","Inc","Avg.Dec","Dec"])


    print ticker
    print changes_df

推荐答案

要获取平均每日涨/跌价格走势,您可以:

To get the average daily up/down price moves, you could:

from pandas_datareader.data import DataReader

data = DataReader('ATVI', "yahoo", datetime(2015, 1, 1), datetime(2015, 12, 31))[['Open', 'Close']]
open = data.Close.resample('M').first() # get the open of the first day, assign date of last day of month
close = data.Close.resample('M').last()  # get the close of the last day, assign date of last day of month
returns = close.subtract(open).div(open) # calculate returns

获得:

Date
2014-01-31   -0.052020
2014-02-28    0.134232
2014-03-31    0.047131
2014-04-30   -0.032866
2014-05-31    0.040561
2014-06-30    0.081474
2014-07-31   -0.007539
2014-08-31    0.049020
2014-09-30   -0.124263
2014-10-31   -0.031083
2014-11-30    0.066503
2014-12-31   -0.042755
2015-01-31    0.038251
2015-02-28    0.103644
2015-03-31   -0.022366
2015-04-30    0.017387
2015-05-31    0.095879
2015-06-30   -0.046850
2015-07-31    0.042863
2015-08-31    0.121865
2015-09-30    0.108758
2015-10-31    0.124919
2015-11-30    0.089384
2015-12-31    0.003630
Freq: M, Name: Close, dtype: float64

要获取几个月的平均值,您可以:

To get the mean by months, you can:

returns.groupby(returns.index.month).mean()

获得:

1    -0.006884
2     0.118938
3     0.012383
4    -0.007739
5     0.068220
6     0.017312
7     0.017662
8     0.085442
9    -0.007752
10    0.046918
11    0.077943
12   -0.019563

这篇关于Python-Pandas Dataframe-数据不匹配的源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆