pandas cumsum +多个列上的cumcount [英] Pandas cumsum + cumcount on multiple columns
问题描述
喂,
我有以下DataFrame
I have the following DataFrame
stores = [1,2,3,4,5]
weeks = [1,1,1,1,1]
df = pd.DataFrame({'Stores' : stores,
'Weeks' : weeks})
df = pd.concat([df]*53)
df['Weeks'] = df['Weeks'].add(df.groupby('Stores').cumcount())
df['Target'] = np.random.randint(400,600,size=len(df))
df['Actual'] = np.random.randint(350,800,size=len(df))
df['Variance %'] = (df['Target'] - df['Actual']) / df['Target']
df.loc[df['Variance %'] >= 0.01, 'Status'] = 'underTarget'
df.loc[df['Variance %'] <= 0.01, 'Status'] = 'overTarget'
df['Status'] = df['Status'].fillna('atTarget')
df.sort_values(['Stores','Weeks'],inplace=True)
这给了我以下内容
print(df.head())
Stores Weeks Target Actual Variance % Status
0 1 1 430 605 -0.406977 overTarget
0 1 2 549 701 -0.276867 overTarget
0 1 3 471 509 -0.080679 overTarget
0 1 4 549 378 0.311475 underTarget
0 1 5 569 708 -0.244288 overTarget
0 1 6 574 650 -0.132404 overTarget
0 1 7 466 623 -0.336910 overTarget
现在我要做的是对商店的累计计数进行统计,这些商店超出或低于目标,但是在状态更改时将其重置.
now what I'm trying to do is do a cumulative count of Stores where they were either over or undertarget but reset when the status changes.
我认为这将是执行此操作(以及此操作的许多变体)的最佳方法,但这不会重置计数器.
I thought this would be the best way to do this (and many variants of this) but this does not reset the counter.
s = df.groupby(['Stores','Weeks','Status'])['Status'].shift().ne(df['Status'])
df['Count'] = s.groupby(df['Stores']).cumsum()
我的逻辑是将我的相关列分组,然后进行!=移位以重置累积金额
my logic was to group by my relevant columns, and do a != shift to reset the cumsum
自然,我已经搜寻了许多不同的问题,但是我似乎无法弄清楚.有谁愿意向我解释解决这个问题的最佳方法是什么?
Naturally I've scoured lots of different questions but I can't seem to figure this out. Would anyone be so kind to explain to me what would be the best method to tackle this problem?
我希望这里的所有内容都是清晰且可重复的.如果您需要任何其他信息,请告诉我.
I hope everything here is clear and reproducible. Please let me know if you need any additional information.
预期产量
Stores Weeks Target Actual Variance % Status Count
0 1 1 430 605 -0.406977 overTarget 1
0 1 2 549 701 -0.276867 overTarget 2
0 1 3 471 509 -0.080679 overTarget 3
0 1 4 549 378 0.311475 underTarget 1 # Reset here as status changes
0 1 5 569 708 -0.244288 overTarget 1 # Reset again.
0 1 6 574 650 -0.132404 overTarget 2
0 1 7 466 623 -0.336910 overTarget 3
推荐答案
在cumsum
s=df.groupby('Stores')['Status'].apply(lambda x : x.ne(x.shift()).ne(0).cumsum())
df['Count']=df.groupby([df.Stores,s]).cumcount()+1
这篇关于 pandas cumsum +多个列上的cumcount的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!