pandas - 将累计值转换为实际值 [英] Pandas - convert cumulative value to actual value
本文介绍了 pandas - 将累计值转换为实际值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
date,site,country_code,kind,ID,rank,votes, session,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0
2017-03-21,website1,US,0,84,214,0.0, 15.0,3.370812,53.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0
2017-03-23,website1,US,0,84,234,0.0, 16.0,3.369048,54.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0
2017-03-25,website1,US,0,84,212,0.0, 16.0,3.369048,54.0
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,54.0
2017-02-15,website2,AU,1,91,144,4.0, 148.0,4.727272,521.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0
2017-02-17,website2,AU,1,91,100,4.0, 148.0,4.727272,524.0
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,527.0
2017-02-19,website2,AU,1,91,114,4.0, 151.0,4.727272,529.0
我需要做的是找到一个特定的
日期+网站+国家+种类+ ID元组的实际数量,这将导致:
日期,网站,country_code,种类,ID,排名,投票,会话,avg_score,count
2017-03-20,website1,US,0,84,226,0.0, 15.0,3.370812,0.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0.0
2017-03-22,website1,US,0,84,226,0.0, 16.0,3.370812,0.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,1.0
2017-03-24,website1,US,0,84,226,0.0, 16.0,3.369048,0.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0.0
2017-03-26,website1,US,0,84,228,0.0, 16.0,3.369048,0.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0.0
2017-02-16,website2,AU,1,91,144,3.0, 147.0,4.727272,3.0
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,0.0
2017-02-18,website2,AU,1,91,118,6.0, 149.0,4.727272,3.0
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,2.0
我知道这将涉及 groupby
呼叫,但我不知道该怎么做。假设元组的第一个实例的计数为 0
。
任何帮助都会令人敬畏。谢谢
解决方案使用
groupby
+ diff
, cumsum
的倒数。 cols = ['site','country_code','kind','ID']
df ['count'] = df.groupby(cols)['count']。diff()。fillna(0)
print(df ['count'])
0 0.0
1 0.0
2 0.0
3 1.0
4 0.0
5 0.0
6 0.0
7 0.0
8 3.0
9 0.0
10 3.0
11 2.0
名称:count,dtype:float64
感谢MaxU指出错误!
Let's say my dataframe looks something like this:
date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,54.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,524.0
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,527.0
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,529.0
The count
column at the very end is a cumulative count.
What I need to do is find the actual count for a particular
date+site+country+kind+ID tuple, which would result in:
date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,1.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0.0
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,0.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3.0
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,0.0
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,3.0
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,2.0
I know this would involve a groupby
call but I have no idea what to do beyond that. Let's assume that the very first instance of the tuple would have a count of 0
.
Any help would awesome. Thanks
解决方案
Use groupby
+ diff
, the inverse of cumsum
.
cols = ['site', 'country_code', 'kind', 'ID']
df['count'] = df.groupby(cols)['count'].diff().fillna(0)
print(df['count'])
0 0.0
1 0.0
2 0.0
3 1.0
4 0.0
5 0.0
6 0.0
7 0.0
8 3.0
9 0.0
10 3.0
11 2.0
Name: count, dtype: float64
Thanks to MaxU for pointing out the error!
这篇关于 pandas - 将累计值转换为实际值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文