在()组中使用 pandas cumsum [英] Using cumsum in pandas on group()

查看:90
本文介绍了在()组中使用 pandas cumsum的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

来自熊猫新手:我看到的数据基本上是这样的 -

  data1 = pd.DataFrame({'Dir ':['E','E','W','W','E','W','W','E'],'Bool':['Y','N',' Y','N','Y','N','Y','N'],'Data':[4,5,6,7,8,9,10,11]},index = pd .DatetimeIndex([ '2000年12月30日', '2000年12月30日', '2000年12月30日', '2001年1月2日', '2001年1月3日',2001年1月3日','12 / 30/2000','12/30/2000']))
data1
Out [1]:
Bool Data Dir
2000-12-30 Y 4 E
2000-12-30 N 5 E
2000-12-30 Y 6 W
2001-01-02 N 7 W
2001-01-03 Y 8 E
2001-01-03 N 9 W
2000-12-30 Y 10 W
2000-12-30 N 11 E

我想按照多个级别对它进行分组,然后做一个cumsum():

例如 running_sum = data1.groupby(['Bool','Dir'])。cumsum()< ;-(不起作用)



,其输出如下所示:

  Bool Dir日期running_sum 
NE 2000-12 -30 16
W 2001-01-02 7
2001-01-03 16
YE 2000-12-30 4
2001-01-03 12
W 2000-12-30 16

我的like代码显然不是很接近。我做了很多尝试,并且学习了很多关于如何不这样做的新东西。



感谢您提供任何帮助。 试试这个:

  data2 = data1.reset_index()
data3 = data2.set_index([Bool,Dir ,index])#index是由reset_index
running_sum = data3.groupby(level = [0,1,2])。sum()。groupby(level = [0,1])创建的新列.cumsum()

你不能简单地使用 cumsum on data3 与您的数据结构有关。按 Bool Dir 进行分组并应用聚合函数( sum mean 等)会产生一个比你开始时更小的DataFrame,因为你使用的任何函数都会根据你的组密钥来聚合值。然而 cumsum 不是一个聚合函数。它将返回一个与它所调用的大小相同的DataFrame。因此,除非您输入的DataFrame格式在调用 cumsum 后输出的大小相同,否则会引发错误。这就是为什么我首先调用 sum ,它会以正确的输入格式返回一个DataFrame。



对不起,这很好地解释了这一点。也许别人能帮我一把?


From a Pandas newbie: I have data that looks essentially like this -

 data1=pd.DataFrame({'Dir':['E','E','W','W','E','W','W','E'], 'Bool':['Y','N','Y','N','Y','N','Y','N'], 'Data':[4,5,6,7,8,9,10,11]}, index=pd.DatetimeIndex(['12/30/2000','12/30/2000','12/30/2000','1/2/2001','1/3/2001','1/3/2001','12/30/2000','12/30/2000']))
data1
Out[1]: 
           Bool  Data Dir
2000-12-30    Y     4   E
2000-12-30    N     5   E
2000-12-30    Y     6   W
2001-01-02    N     7   W
2001-01-03    Y     8   E
2001-01-03    N     9   W
2000-12-30    Y    10   W
2000-12-30    N    11   E

And I want to group it by multiple levels, then do a cumsum():

E.g., like running_sum=data1.groupby(['Bool','Dir']).cumsum() <-(Doesn't work)

with output that would look something like:

Bool Dir Date        running_sum
N    E   2000-12-30           16
     W   2001-01-02            7
         2001-01-03           16
Y    E   2000-12-30            4
         2001-01-03           12
     W   2000-12-30           16

My "like" code is clearly not even close. I have made a number of attempts and learned many new things about how not to do this.

Thanks for any help you can give.

解决方案

Try this:

data2 = data1.reset_index()
data3 = data2.set_index(["Bool", "Dir", "index"])   # index is the new column created by reset_index
running_sum = data3.groupby(level=[0,1,2]).sum().groupby(level=[0,1]).cumsum()

The reason you cannot simply use cumsum on data3 has to do with how your data is structured. Grouping by Bool and Dir and applying an aggregation function (sum, mean, etc) would produce a DataFrame of a smaller size than you started with, as whatever function you used would aggregate values based on your group keys. However cumsum is not an aggreagation function. It wil return a DataFrame that is the same size as the one it's called with. So unless your input DataFrame is in a format where the output can be the same size after calling cumsum, it will throw an error. That's why I called sum first, which returns a DataFrame in the correct input format.

Sorry if I haven't explained this well enough. Maybe someone else could help me out?

这篇关于在()组中使用 pandas cumsum的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆