累加和数据框的条件计数-遍历列 [英] Conditional count of cumulative sum Dataframe - Loop through columns

查看:66
本文介绍了累加和数据框的条件计数-遍历列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据每个值的符号来计算数据帧内带有重置的累积总和.想法是每列分别进行相同的练习.

Im trying to compute a cumulative sum with a reset within a dataframe, based on the sign of each values. The idea is to the same exercise for each column separately.

例如,假设我具有以下数据框:

For example, let's assume I have the following dataframe:

df = pd.DataFrame({'A': [1,1,1,-1,-1,1,1,1,1,-1,-1,-1],'B':[1,1,-1,-1,-1,1,1,1,-1,-1,-1,1]},index=[0, 1, 2, 3,4,5,6,7,8,9,10,11])

对于每一列,我要计算累积和,直到找到符号变化为止.在这种情况下,总和应重置为1.对于上面的示例,我期望得到以下结果:

For each column, I want to compute the cumulative sum until I find a change in sign; in which case, the sum should be reset to 1. For the example above, I am expecting the following result:

df1=pd.DataFrame({'A_cumcount':[1,2,3,1,2,1,2,3,4,1,2,3],'B_cumcount':[1,2,1,2,3,1,2,3,1,2,3,4],index=[0,1,2,3,4,5,6,7,8,9,10,11]})

此处已讨论过类似的问题:熊猫:有条件的滚动计数

Similar issue has been discussed here: Pandas: conditional rolling count

我尝试了以下代码:

nb_col=len(df.columns) #number of columns in dataframe


for i in range(0,int(nb_col)): #Loop through the number of columns in the dataframe

    name=df.columns[i] #read the column name
    name=name+'_cumcount' 


    #add column for the calculation
    df=df.reindex(columns=np.append(df.columns.values, [name])) 

    df=df[df.columns[nb_col+i]]=df.groupby((df[df.columns[i]] != df[df.columns[i]].shift(1)).cumsum()).cumcount()+1

我的问题是,有没有办法避免这种for循环?因此,我可以避免每次都附加一个新列,并使计算速度更快.谢谢

My question is, is there a way to avoid this for loop? So I can avoid appending a new column each time and make the computation faster. Thank you

收到答复(一切正常): 来自@nixon df.apply(lambda x: x.groupby(x.diff().ne(0).cumsum()).cumcount()+1).add_suffix('_cumcount')

Answers received (all working fine): From @nixon df.apply(lambda x: x.groupby(x.diff().ne(0).cumsum()).cumcount()+1).add_suffix('_cumcount')

来自@jezrael df1 = (df.apply(lambda x: x.groupby((x != x.shift()).cumsum()).cumcount() + 1).add_suffix('_cumcount'))

From @jezrael df1 = (df.apply(lambda x: x.groupby((x != x.shift()).cumsum()).cumcount() + 1).add_suffix('_cumcount'))

来自@斯科特波士顿(Scott Boston):

From @Scott Boston:

df.apply(lambda x: x.groupby(x.diff().bfill().ne(0).cumsum()).cumcount() + 1)

推荐答案

我认为在熊猫中需要循环,例如通过apply:

I think in pandas need loop, e.g. by apply:

df1 = (df.apply(lambda x: x.groupby((x != x.shift()).cumsum()).cumcount() + 1)
         .add_suffix('_cumcount'))
print (df1)
    A_cumcount  B_cumcount
0            1           1
1            2           2
2            3           1
3            1           2
4            2           3
5            1           1
6            2           2
7            3           3
8            4           1
9            1           2
10           2           3
11           3           1

这篇关于累加和数据框的条件计数-遍历列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆