pandas - 将累计值转换为实际值 [英] Pandas - convert cumulative value to actual value

查看：135 发布时间：2018/5/30 14:23:44 python pandas group-by pandas-groupby

本文介绍了 pandas - 将累计值转换为实际值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我的数据框看起来像这样：

date，site，country_code，kind，ID，rank，votes， session，avg_score，count 2017-03-20，website1，US，0,84,226,0.0,15.0,3.370812,53.0 2017-03-21，website1，US，0,84,214,0.0， 15.0,3.370812,53.0 2017-03-22，website1，US，0,84,226,0.0,16.0,3.370812,53.0 2017-03-23，website1，US，0,84,234,0.0， 16.0,3.369048,54.0 2017-03-24，website1，US，0,84,226,0.0,16.0,3.369048,54.0 2017-03-25，website1，US，0,84,212,0.0， 16.0,3.369048,54.0 2017-03-26，website1，US，0,84,228,0.0,16.0,3.369048,54.0 2017-02-15，website2，AU，1,91,144,4.0， 148.0,4.727272,521.0 2017-02-16，website2，AU，1,91,144,3.0,147.0,4.727272,524.0 2017-02-17，website2，AU，1,91,100,4.0， 148.0,4.727272,524.0 2017-02-18，website2，AU，1,91,118,6.0,149.0,4.727272,527.0 2017-02-19，website2，AU，1,91,114,4.0， 151.0,4.727272,529.0

count 列最后是累计数。
我需要做的是找到一个特定的
日期+网站+国家+种类+ ID元组的实际数量，这将导致：

 日期，网站，country_code，种类，ID，排名，投票，会话，avg_score，count 
 2017-03-20，website1，US，0,84,226,0.0， 15.0,3.370812,0.0 
 2017-03-21，website1，US，0,84,214,0.0,15.0,3.370812,0.0 
 2017-03-22，website1，US，0,84,226,0.0， 16.0,3.370812,0.0 
 2017-03-23，website1，US，0,84,234,0.0,16.0,3.369048,1.0 
 2017-03-24，website1，US，0,84,226,0.0， 16.0,3.369048,0.0 
 2017-03-25，website1，US，0,84,212,0.0,16.0,3.369048,0.0 
 2017-03-26，website1，US，0,84,228,0.0， 16.0,3.369048,0.0 
 2017-02-15，website2，AU，1,91,144,4.0,148.0,4.727272,0.0 
 2017-02-16，website2，AU，1,91,144,3.0， 147.0,4.727272,3.0 
 2017-02-17，website2，AU，1,91,100,4.0,148.0,4.727272,0.0 
 2017-02-18，website2，AU，1,91,118,6.0， 149.0,4.727272,3.0 
 2017-02-19，website2，AU，1,91,114,4.0,151.0,4.727272,2.0

我知道这将涉及 groupby 呼叫，但我不知道该怎么做。假设元组的第一个实例的计数为 0 。
任何帮助都会令人敬畏。谢谢

解决方案使用 groupby + diff ， cumsum 的倒数。

cols = ['site'，'country_code'，'kind'，'ID'] df ['count'] = df.groupby（cols）['count']。diff（）。fillna（0） print（df ['count']） 0 0.0 1 0.0 2 0.0 3 1.0 4 0.0 5 0.0 6 0.0 7 0.0 8 3.0 9 0.0 10 3.0 11 2.0 名称：count，dtype：float64

感谢MaxU指出错误！

Let's say my dataframe looks something like this:

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,54.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,524.0
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,527.0
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,529.0

The count column at the very end is a cumulative count. What I need to do is find the actual count for a particular date+site+country+kind+ID tuple, which would result in:

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,1.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0.0
2017-03-26,website1,US,0,84,228,0.0,16.0,3.369048,0.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3.0
2017-02-17,website2,AU,1,91,100,4.0,148.0,4.727272,0.0
2017-02-18,website2,AU,1,91,118,6.0,149.0,4.727272,3.0
2017-02-19,website2,AU,1,91,114,4.0,151.0,4.727272,2.0

I know this would involve a groupby call but I have no idea what to do beyond that. Let's assume that the very first instance of the tuple would have a count of 0. Any help would awesome. Thanks

解决方案

Use groupby + diff, the inverse of cumsum.

cols = ['site', 'country_code', 'kind', 'ID']
df['count'] = df.groupby(cols)['count'].diff().fillna(0)

print(df['count'])
0     0.0
1     0.0
2     0.0
3     1.0
4     0.0
5     0.0
6     0.0
7     0.0
8     3.0
9     0.0
10    3.0
11    2.0
Name: count, dtype: float64

Thanks to MaxU for pointing out the error!

这篇关于 pandas - 将累计值转换为实际值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas - 将累计值转换为实际值 [英] Pandas - convert cumulative value to actual value

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas - 将累计值转换为实际值 [英] Pandas - convert cumulative value to actual value

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭