Pandas groupby 多个字段然后比较 [英] Pandas groupby multiple fields then diff

查看:235
本文介绍了Pandas groupby 多个字段然后比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我的数据框看起来像这样:

So my dataframe looks like this:

         date    site country  score
0  2018-01-01  google      us    100
1  2018-01-01  google      ch     50
2  2018-01-02  google      us     70
3  2018-01-03  google      us     60
4  2018-01-02  google      ch     10
5  2018-01-01      fb      us     50
6  2018-01-02      fb      us     55
7  2018-01-03      fb      us    100
8  2018-01-01      fb      es    100
9  2018-01-02      fb      gb    100

每个 site 都有不同的分数,具体取决于 country.我正在尝试为每个 site/country 组合找到 score 的 1/3/5 天差异.

Each site has a different score depending on the country. I'm trying to find the 1/3/5-day difference of scores for each site/country combination.

输出应该是:

          date    site country  score  diff
8  2018-01-01      fb      es    100   0.0
9  2018-01-02      fb      gb    100   0.0
5  2018-01-01      fb      us     50   0.0
6  2018-01-02      fb      us     55   5.0
7  2018-01-03      fb      us    100  45.0
1  2018-01-01  google      ch     50   0.0
4  2018-01-02  google      ch     10 -40.0
0  2018-01-01  google      us    100   0.0
2  2018-01-02  google      us     70 -30.0
3  2018-01-03  google      us     60 -10.0

我首先尝试按 site/country/date 排序,然后按 site 分组country 但我无法从分组对象中获得差异.

I first tried sorting by site/country/date, then grouping by site and country but I'm not able to wrap my head around getting a difference from a grouped object.

推荐答案

首先对DataFrame进行排序,然后你只需要groupby.diff():

First, sort the DataFrame and then all you need is groupby.diff():

df = df.sort_values(by=['site', 'country', 'date'])

df['diff'] = df.groupby(['site', 'country'])['score'].diff().fillna(0)

df
Out: 
         date    site country  score  diff
8  2018-01-01      fb      es    100   0.0
9  2018-01-02      fb      gb    100   0.0
5  2018-01-01      fb      us     50   0.0
6  2018-01-02      fb      us     55   5.0
7  2018-01-03      fb      us    100  45.0
1  2018-01-01  google      ch     50   0.0
4  2018-01-02  google      ch     10 -40.0
0  2018-01-01  google      us    100   0.0
2  2018-01-02  google      us     70 -30.0
3  2018-01-03  google      us     60 -10.0

sort_values 不支持任意排序.如果您需要任意排序(例如在 fb 之前使用 google),您需要将它们存储在一个集合中并将您的列设置为分类.然后 sort_values 将尊重您在那里提供的顺序.

sort_values doesn't support arbitrary orderings. If you need to sort arbitrarily (google before fb for example) you need to store them in a collection and set your column as categorical. Then sort_values will respect the ordering you provided there.

这篇关于Pandas groupby 多个字段然后比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆