pandas groupby差异 [英] Pandas groupby diff
问题描述
因此,我的数据框如下所示:
So my dataframe looks like this:
from pandas.compat import StringIO
d = StringIO('''
date,site,country,score
2018-01-01,google,us,100
2018-01-01,google,ch,50
2018-01-02,google,us,70
2018-01-03,google,us,60
2018-01-02,google,ch,10
2018-01-01,fb,us,50
2018-01-02,fb,us,55
2018-01-03,fb,us,100
2018-01-01,fb,es,100
2018-01-02,fb,gb,100
''')
df = pd.read_csv(d, sep=",")
每个网站都有不同的分数,具体取决于国家。我试图找出每个网站/国家/地区组合的1/3/5天的分数差异。
Each site has a different score depending on the country. I'm trying to find the 1/3/5 day difference of scores for each site/country combination.
输出结果应该是:
Output should be:
date,site,country,score,1_day_diff
2018-01-01,google,ch,50,0
2018-01-02,google,ch,10,-40
2018-01-01,google,us,100,0
2018-01-02,google,us,70,-30
2018-01-03,google,us,60,-10
2018-01-01,fb,es,100,0
2018-01-02,fb,gb,100,0
2018-01-01,fb,us,50,0
2018-01-02,fb,us,55,5
2018-01-03,fb,us,100,45
我首先尝试按网站/国家/日期进行排序,然后按网站和国家/地区进行分组,但我无法围绕分组对象获取差异。
I first tried sorting by site/country/date, then grouping by site and country but I'm not able to wrap my head around getting a difference from a grouped object.
推荐答案
首先,对DataFrame进行排序,然后您需要的只是 groupby.diff()
:
First, sort the DataFrame and then all you need is groupby.diff()
:
df = df.sort_values(by=['site', 'country', 'date'])
df['diff'] = df.groupby(['site', 'country'])['score'].diff().fillna(0)
df
Out:
date site country score diff
8 2018-01-01 fb es 100 0.0
9 2018-01-02 fb gb 100 0.0
5 2018-01-01 fb us 50 0.0
6 2018-01-02 fb us 55 5.0
7 2018-01-03 fb us 100 45.0
1 2018-01-01 google ch 50 0.0
4 2018-01-02 google ch 10 -40.0
0 2018-01-01 google us 100 0.0
2 2018-01-02 google us 70 -30.0
3 2018-01-03 google us 60 -10.0
sort_values
不支持任意排序。如果您需要任意排序(例如fb之前的谷歌),您需要将它们存储在一个集合中,并将您的列设置为明确的。然后sort_values会尊重您在那里提供的订单。
sort_values
doesn't support arbitrary orderings. If you need to sort arbitrarily (google before fb for example) you need to store them in a collection and set your column as categorical. Then sort_values will respect the ordering you provided there.
这篇关于 pandas groupby差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!