Groupby 并滞后数据帧的所有列? [英] Groupby and lag all columns of a dataframe?

查看:47
本文介绍了Groupby 并滞后数据帧的所有列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想按组滞后数据框中的每一列.我有一个这样的框架:

将 numpy 导入为 np将熊猫导入为 pdindex = pd.date_range('2015-11-20', period=6, freq='D')df = pd.DataFrame(dict(time=index, grp=['A']*3 + ['B']*3, col1=[1,2,3]*2,col2=['a','b','c']*2)).set_index(['time','grp'])

看起来像

 col1 col2时间组2015-11-20 甲 1 甲2015-11-21 甲 2 乙2015-11-22 A 3 c2015-11-23 乙 1 一2015-11-24 乙 2 乙2015-11-25 乙 3 丙

我希望它看起来像这样:

 col1 col2 col1_lag col2_lag时间组2015-11-20 A 1 a 2 b2015-11-21 A 2 b 3 c2015-11-22 A 3 c NA NA2015-11-23 B 1 a 2 b2015-11-24 B 2 b 3 c2015-11-25 B 3 c NA NA

这个问题管理单个列的结果,但我有任意数量的列,我想滞后所有列.我可以使用 groupbyapply,但是 apply 在每一列上独立运行 shift 函数,它不会似乎喜欢接收一个 [nrow, 2] 形状的数据帧作为回报.是否有像 apply 这样的函数作用于整个组子框架?或者有没有更好的方法来做到这一点?

解决方案

IIUC,你可以简单地使用level="grp"然后移位-1:

<预><代码>>>>shift = df.groupby(level="grp").shift(-1)>>>df.join(shifted.rename(columns=lambda x: x+"_lag"))col1 col2 col1_lag col2_lag时间组2015-11-20 A 1 a 2 b2015-11-21 A 2 b 3 c2015-11-22 A 3 c NaN NaN2015-11-23 B 1 a 2 b2015-11-24 B 2 b 3 c2015-11-25 B 3 c NaN NaN

I want to lag every column in a dataframe, by group. I have a frame like this:

import numpy as np
import pandas as pd

index = pd.date_range('2015-11-20', periods=6, freq='D')

df = pd.DataFrame(dict(time=index, grp=['A']*3 + ['B']*3, col1=[1,2,3]*2,
    col2=['a','b','c']*2)).set_index(['time','grp'])

which looks like

                col1 col2
time       grp           
2015-11-20 A       1    a
2015-11-21 A       2    b
2015-11-22 A       3    c
2015-11-23 B       1    a
2015-11-24 B       2    b
2015-11-25 B       3    c

and I want it to look like this:

                col1 col2 col1_lag col2_lag
time       grp                     
2015-11-20 A       1    a        2        b
2015-11-21 A       2    b        3        c
2015-11-22 A       3    c       NA       NA
2015-11-23 B       1    a        2        b
2015-11-24 B       2    b        3        c
2015-11-25 B       3    c       NA       NA

This question manages the result for a single column, but I have an arbitrary number of columns, and I want to lag all of them. I can use groupby and apply, but apply runs the shift function over each column independently, and it doesn't seem to like receiving an [nrow, 2] shaped dataframe in return. Is there perhaps a function like apply that acts on the whole group sub-frame? Or is there a better way to do this?

解决方案

IIUC, you can simply use level="grp" and then shift by -1:

>>> shifted = df.groupby(level="grp").shift(-1)
>>> df.join(shifted.rename(columns=lambda x: x+"_lag"))
                col1 col2  col1_lag col2_lag
time       grp                              
2015-11-20 A       1    a         2        b
2015-11-21 A       2    b         3        c
2015-11-22 A       3    c       NaN      NaN
2015-11-23 B       1    a         2        b
2015-11-24 B       2    b         3        c
2015-11-25 B       3    c       NaN      NaN

这篇关于Groupby 并滞后数据帧的所有列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆