Pandas groupby 多个字段然后比较 [英] Pandas groupby multiple fields then diff
问题描述
所以我的数据框看起来像这样:
So my dataframe looks like this:
date site country score
0 2018-01-01 google us 100
1 2018-01-01 google ch 50
2 2018-01-02 google us 70
3 2018-01-03 google us 60
4 2018-01-02 google ch 10
5 2018-01-01 fb us 50
6 2018-01-02 fb us 55
7 2018-01-03 fb us 100
8 2018-01-01 fb es 100
9 2018-01-02 fb gb 100
每个 site
都有不同的分数,具体取决于 country
.我正在尝试为每个 site
/country
组合找到 score
的 1/3/5 天差异.
Each site
has a different score depending on the country
. I'm trying to find the 1/3/5-day difference of score
s for each site
/country
combination.
输出应该是:
date site country score diff
8 2018-01-01 fb es 100 0.0
9 2018-01-02 fb gb 100 0.0
5 2018-01-01 fb us 50 0.0
6 2018-01-02 fb us 55 5.0
7 2018-01-03 fb us 100 45.0
1 2018-01-01 google ch 50 0.0
4 2018-01-02 google ch 10 -40.0
0 2018-01-01 google us 100 0.0
2 2018-01-02 google us 70 -30.0
3 2018-01-03 google us 60 -10.0
我首先尝试按 site
/country
/date
排序,然后按 site
和 分组country
但我无法从分组对象中获得差异.
I first tried sorting by site
/country
/date
, then grouping by site
and country
but I'm not able to wrap my head around getting a difference from a grouped object.
推荐答案
首先对DataFrame进行排序,然后你只需要groupby.diff()
:
First, sort the DataFrame and then all you need is groupby.diff()
:
df = df.sort_values(by=['site', 'country', 'date'])
df['diff'] = df.groupby(['site', 'country'])['score'].diff().fillna(0)
df
Out:
date site country score diff
8 2018-01-01 fb es 100 0.0
9 2018-01-02 fb gb 100 0.0
5 2018-01-01 fb us 50 0.0
6 2018-01-02 fb us 55 5.0
7 2018-01-03 fb us 100 45.0
1 2018-01-01 google ch 50 0.0
4 2018-01-02 google ch 10 -40.0
0 2018-01-01 google us 100 0.0
2 2018-01-02 google us 70 -30.0
3 2018-01-03 google us 60 -10.0
sort_values
不支持任意排序.如果您需要任意排序(例如在 fb 之前使用 google),您需要将它们存储在一个集合中并将您的列设置为分类.然后 sort_values 将尊重您在那里提供的顺序.
sort_values
doesn't support arbitrary orderings. If you need to sort arbitrarily (google before fb for example) you need to store them in a collection and set your column as categorical. Then sort_values will respect the ordering you provided there.
这篇关于Pandas groupby 多个字段然后比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!