取 N 最后天的平均值 [英] Taking the mean value of N last days

查看:62
本文介绍了取 N 最后天的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个数据框:

ID      Date  X  123_Var  456_Var  789_Var
 A  16-07-19  3      777      250      810
 A  17-07-19  9      637      121      529
 A  20-07-19  2      295      272      490
 A  21-07-19  3      778      600      544
 A  22-07-19  6      741      792      907
 A  25-07-19  6      435      416      820
 A  26-07-19  8      590      455      342
 A  27-07-19  6      763      476      753
 A  02-08-19  6      717      211      454
 A  03-08-19  6      152      442      475
 A  05-08-19  6      564      340      302
 A  07-08-19  6      105      929      633
 A  08-08-19  6      948      366      586
 B  07-08-19  4      509      690      406
 B  08-08-19  2      413      725      414
 B  12-08-19  2      170      702      912
 B  13-08-19  3      851      616      477
 B  14-08-19  9      475      447      555
 B  15-08-19  1      412      403      708
 B  17-08-19  2      299      537      321
 B  18-08-19  4      310      119      125

我想显示 n 过去几天的 mean 值(使用 Date 列),不包括当天的值.

I want to show the mean value of n last days (using Date column), excluding the value of current day.

我正在使用此代码(我应该怎么做才能解决这个问题?):

I'm using this code (what should I do to fix this?):

n = 4

cols = list(df.filter(regex='Var').columns)

df = df.set_index('Date')

df[cols] = (df.groupby('ID').rolling(window=f'{n}D')[cols].mean()
         .reset_index(0,drop=True).add_suffix(f'_{n}'))

df.reset_index(inplace=True)

预期结果:

ID      Date  X  123_Var  456_Var  789_Var   123_Var_4   456_Var_4  789_Var_4
 A  16-07-19  3      777      250      810         NaN         NaN        NaN
 A  17-07-19  9      637      121      529  777.000000  250.000000      810.0
 A  20-07-19  2      295      272      490  707.000000  185.500000      669.5
 A  21-07-19  3      778      600      544  466.000000  196.500000      509.5
 A  22-07-19  6      741      792      907  536.500000  436.000000      517.0
 A  25-07-19  6      435      416      820  759.500000  696.000000      725.5
 A  26-07-19  8      590      455      342  588.000000  604.000000      863.5
 A  27-07-19  6      763      476      753  512.500000  435.500000      581.0
 A  02-08-19  6      717      211      454         NaN         NaN        NaN
 A  03-08-19  6      152      442      475  717.000000  211.000000      454.0
 A  05-08-19  6      564      340      302  434.500000  326.500000      464.5
 A  07-08-19  6      105      929      633  358.000000  391.000000      388.5
 A  08-08-19  6      948      366      586  334.500000  634.500000      467.5
 B  07-08-19  4      509      690      406         NaN         NaN        NaN
 B  08-08-19  2      413      725      414  509.000000  690.000000      406.0
 B  12-08-19  2      170      702      912  413.000000  725.000000      414.0
 B  13-08-19  3      851      616      477  291.500000  713.500000      663.0
 B  14-08-19  9      475      447      555  510.500000  659.000000      694.5
 B  15-08-19  1      412      403      708  498.666667  588.333333      648.0
 B  17-08-19  2      299      537      321  579.333333  488.666667      580.0
 B  18-08-19  4      310      119      125  395.333333  462.333333      528.0

  • 注意:数据框已更改.
  • 推荐答案

    I change unutbu solutionrolling 中工作:

    I change unutbu solution for working in rolling:

    df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
    
    n = 5
    cols = df.filter(regex='Var').columns
    df = df.set_index('Date')
    df_ = df.set_index('ID', append=True).swaplevel(1,0)
    df1 = df.groupby('ID').rolling(window=f'{n}D')[cols].count()
    df2 = df.groupby('ID').rolling(window=f'{n}D')[cols].mean()
    df3 = (df1.mul(df2)
              .sub(df_[cols])
              .div(df1[cols].sub(1)).add_suffix(f'_{n}')
              )
    df4 = df_.join(df3)
    

    <小时>

    print (df4)
                   X  123_Var  456_Var  789_Var   123_Var_5   456_Var_5  789_Var_5
    ID Date                                                                       
    A  2019-07-16  3      777      250      810         NaN         NaN        NaN
       2019-07-17  9      637      121      529  777.000000  250.000000      810.0
       2019-07-20  2      295      272      490  707.000000  185.500000      669.5
       2019-07-21  3      778      600      544  466.000000  196.500000      509.5
       2019-07-22  6      741      792      907  536.500000  436.000000      517.0
       2019-07-25  6      435      416      820  759.500000  696.000000      725.5
       2019-07-26  8      590      455      342  588.000000  604.000000      863.5
       2019-07-27  6      763      476      753  512.500000  435.500000      581.0
       2019-08-02  6      717      211      454         NaN         NaN        NaN
       2019-08-03  6      152      442      475  717.000000  211.000000      454.0
       2019-08-05  6      564      340      302  434.500000  326.500000      464.5
       2019-08-07  6      105      929      633  358.000000  391.000000      388.5
       2019-08-08  6      948      366      586  334.500000  634.500000      467.5
    B  2019-08-07  4      509      690      406         NaN         NaN        NaN
       2019-08-08  2      413      725      414  509.000000  690.000000      406.0
       2019-08-12  2      170      702      912  413.000000  725.000000      414.0
       2019-08-13  3      851      616      477  170.000000  702.000000      912.0
       2019-08-14  9      475      447      555  510.500000  659.000000      694.5
       2019-08-15  1      412      403      708  498.666667  588.333333      648.0
       2019-08-17  2      299      537      321  579.333333  488.666667      580.0
       2019-08-18  4      310      119      125  395.333333  462.333333      528.0
    

    这篇关于取 N 最后天的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆