比较 pandas 在不同年份的同一天 [英] Compare Same Days of Different Years in Pandas

查看:54
本文介绍了比较 pandas 在不同年份的同一天的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我还是一个新手,仍在尝试弄清楚Pandas-拆分/应用/合并已开始变得有意义,但我还没有.

I'm a newbie still trying to figure out Pandas - split/apply/combine is just starting to make sense but I'm not there yet.

我正在尝试获取前几年(2013-2015年)同一天的历史最高价和最低价销售电话,然后在另一列中告诉我2016年销售价比之前的最高价或更低的天数比之前的低点和值是多少.

I'm trying to get the historical high and low sales call numbers from the same day in previous years (2013-2015), and then have another column that tells me what days the 2016 calls were higher than previous highs or lower than previous lows and what the values were.

到目前为止,这是我的尝试:

Here's my attempt so far:

df = pd.read_csv('filename.csv')
df['Date']  = pd.to_datetime(df['Date'])
df = df[(df['Date']>= '01/01/2013') & (df['Date'] <= '12/31/2015')]
df['Month']     = df.Date.dt.month
df['Day']   = df.Date.dt.day

dfMAX = df[df['Element'] == "MAX"]
dfMAX = (dfMAX.groupby([dfMAX.Date.dt.month, dfMAX.Date.dt.day, 'Element'])
            [['Data_Value']]
            .agg(['max']))

这使我每天获得最大值,我知道我可以为最小值重复此操作.我正在努力了解如何将其与多索引重新组合在一起,然后如何获得高于或低于已记录的最小/最大的2016年值.

This gets me the max values per day and I know I could repeat this for the min values. I'm struggling to know how to put it back together with the multi-index and then how to get the 2016 values that were either higher or lower than the min/max already recorded.

我不担心leap年的数据,为此,可以删除/忽略2月29日.

I'm not worried about leap year data, Feb 29th can be dropped/ignored for this.

输出将是这样的:

Day of Year   Min   Max     2016
    1/1       50   1900     
    1/2       23   2100     2102
    1/3       90   1800      85
    1/4       89   1750
    1/5       50   2309      45
    1/6       44   5600     5649

我在此处发布了一个csv文件: https://drive.google.com/open?id=0B4xdnV0LFZI1dUE3ZFBxdWFQOGc

I've posted a csv file here: https://drive.google.com/open?id=0B4xdnV0LFZI1dUE3ZFBxdWFQOGc

感谢您的帮助, 我

推荐答案

我将使用以下步骤解决此问题:

I would tackle this using the follow steps:

  1. 确保日期是datetime列dtype.然后,创建一个新列 使用.dt.dayofyear在名为Day_Of_Year的数据框中显示.
  2. 将您的数据框分为两个数据框,一个是2013年到2015年,另一个是 2016年.
  3. 采用2013年到2015年的数据框,并在groupby.agg中使用 最小值和最大值以获取2013年到2015年的最小值和最大值,以创建一个 数据框每日记录.
  4. 接下来,将此新的每日记录数据框与2016数据框合并 使用pd.merge,您可以将2016年至今的索引设置为 使用left_index=Trueright_index=True合并索引.
  5. 最后,我然后将使用布尔索引来仅选择那些 记录2016年值超出最小值或最大值的地方 合并数据框中的所有列.
  1. Make sure date is a datetime column dtype. Then, create a new column in your dataframe called Day_Of_Year using .dt.dayofyear.
  2. Split your dataframe into two dataframes, one 2013 thru 2015 and 2016.
  3. Take your 2013 thru 2015 dataframe and use groupby with .agg of min and max to get your min and max values for 2013 thru 2015 to create a dataframe daily records.
  4. Next merge this new daily records dataframe with 2016 dataframe using pd.merge, you can set the index on 2016 to date and the merge on indexes using left_index=True and right_index=True.
  5. Lastly, I would then using boolean indexing to select only those records where the 2016 value is outside of the min or the max columns in your merged dataframe.

您应该在数据中得到以下信息:

You should get something like this with your data:

           min   max       Date Element  Value
DayofYear                                     
1          545  1812 2016-01-01     MAX   1887
3          108  1815 2016-01-03     MAX   1906
4          496  1618 2016-01-04     MAX   1701
6          455  1864 2016-01-06     MIN    169
8          511  1771 2016-01-08     MIN    232

请在下面查看我的扰流器代码.

See my spoiler code hover below.

df = pd.read_csv('Downloads/Day_Over_Day.csv',parse_dates = ['Date'])
1. df ['DayofYear'] = df ['Date'].dt.dayofyear
2a. df_13_15 = df.query('日期<"2016-01-01"')
2b. df_2016 = df.query('日期> ="2016-01-01"')
3.& 4. df_Overlay = pd.merge(df_13_15.groupby('DayofYear')['Value'].agg(['min','max']),df_2016.set_index('DayofYear'),left_index = True,right_index =正确)
5. print(df_Overlay [(df_Overlay ['Value']< df_Overlay ['min'])|(df_Overlay ['Value']> df_Overlay ['max'])].head())

df = pd.read_csv('Downloads/Day_Over_Day.csv', parse_dates=['Date'])
1. df['DayofYear']=df['Date'].dt.dayofyear
2a. df_13_15 = df.query('Date < "2016-01-01"')
2b. df_2016 = df.query('Date >= "2016-01-01"')
3. & 4. df_Overlay = pd.merge(df_13_15.groupby('DayofYear') ['Value'].agg(['min','max']),df_2016.set_index('DayofYear'), left_index=True, right_index=True)
5. print(df_Overlay[(df_Overlay['Value'] < df_Overlay['min']) | (df_Overlay['Value'] > df_Overlay['max'])].head())

这篇关于比较 pandas 在不同年份的同一天的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆