如何在 pandas 数据框中进行SQL样式聚合 [英] How to have SQL style aggregation in pandas dataframe

查看:59
本文介绍了如何在 pandas 数据框中进行SQL样式聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望在Python中使用 SQL 样式聚合。

I wish to have an SQL style aggregation in Python.

# Example DataFrame
df = pd.DataFrame({'ID':[1,1,2,2,2],
                   'revenue':[1,3,5,1,5],
                   'month':['2012-01-01','2012-01-01','2012-03-01','2014-01-01','2012-01-01']})

print(df)
   ID       month  revenue
0   1  2012-01-01        1
1   1  2012-01-01        3
2   2  2012-03-01        5
3   2  2014-01-01        1
4   2  2012-01-01        5

现在,我想计算总收入 ,唯一的个月和每个 ID 的前个月。我得到的数字是我想要的,但没有列名样式,因为它们分布在两行中。

Now, I would like to calculate the total revenue, number of unique months and the first month for every ID. I get the numbers as I want, but not the column names style, as they are spread in two rows.

df = df.groupby(['ID']).agg({'revenue':'sum','month':['nunique','first']}).reset_index()
print(df)    
  ID revenue   month            
         sum nunique       first
0  1       4       1  2012-01-01
1  2      11       3  2012-03-01

正常的SQL脚本类似于以下伪代码-

A normal SQL script would be something like the following pseudo code -

select ID, sum(revenue) as revenue, count(month) as distinct_m, first(month) as first_m from table group by ID ...

我想要的输出:

   ID    revenue  distinct_m     first_m
0  1           4           1  2012-01-01
1  2          11           3  2012-03-01


推荐答案

您可以尝试一下。

df.groupby('ID').agg(revenue = ('revenue','sum'),
                     distinct_m = ('month','nunique'),
                     first_m = ('month','first')).reset_index()

ID    revenue  distinct_m     first_m
1         4           1  2012-01-01
2        11           3  2012-03-01

这篇关于如何在 pandas 数据框中进行SQL样式聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆