Pandas groupby 和聚合输出应包括所有原始列(包括未聚合的列) [英] Pandas groupby and aggregation output should include all the original columns (including the ones not aggregated on)

查看：64 发布时间：2021/12/26 23:59:20 python pandas dataframe group-by pandas-groupby

本文介绍了Pandas groupby 和聚合输出应包括所有原始列(包括未聚合的列)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下数据框并且想要:

I have the following data frame and want to:

按月份
每个唯一UPC_ID(每月)的QTY_SOLD和NET_AMT总和
在结果数据框中也包括其余的列

我认为我能做到的方法是第一:创建一个 month 列来聚合 D_DATES，然后将 QTY_SOLD 与 相加>UPC_ID.

The way I thought I can do this is 1st: create a month column to aggregate the D_DATES, then sum QTY_SOLD by UPC_ID.

脚本:

# Convert date to date time object
df['D_DATE'] = pd.to_datetime(df['D_DATE'])

# Create aggregated months column
df['month'] = df['D_DATE'].apply(dt.date.strftime, args=('%Y.%m',))

# Group by month and sum up quantity sold by UPC_ID
df = df.groupby(['month', 'UPC_ID'])['QTY_SOLD'].sum()

<小时>

当前数据框:

Current data frame:

UPC_ID | UPC_DSC | D_DATE | QTY_SOLD | NET_AMT
----------------------------------------------
111      desc1    2/26/2017   2         10 (2 x $5)
222      desc2    2/26/2017   3         15
333      desc3    2/26/2017   1         4
111      desc1    3/1/2017    1         5
111      desc1    3/3/2017    4         20

期望的输出:

MONTH | UPC_ID | QTY_SOLD | NET_AMT | UPC_DSC
----------------------------------------------
2017-2      111     2         10       etc...
2017-2      222     3         15
2017-2      333     1         4
2017-3      111     5         25

实际输出:

MONTH | UPC_ID  
----------------------------------------------
2017-2      111     2
            222     3
            333     1
2017-3      111     5
...

问题:

如何在每一行中包含月份?
如何包含数据框的其余列?
除了QTY_SOLD之外，还如何求和NET_AMT?

How do I include the month for each row?
How do I include the rest of the columns of the dataframe?
How do also sum NET_AMT in addition to QTY_SOLD?

`agg` with a `dict` of functions

创建一个dict函数并将其传递给agg.您还需要 as_index=False 以防止组列成为输出中的索引.

`agg` with a `dict` of functions

Create a dict of functions and pass it to agg. You'll also need as_index=False to prevent the group columns from becoming the index in your output.

f = {'NET_AMT': 'sum', 'QTY_SOLD': 'sum', 'UPC_DSC': 'first'}
df.groupby(['month', 'UPC_ID'], as_index=False).agg(f)

     month  UPC_ID UPC_DSC  NET_AMT  QTY_SOLD
0  2017.02     111   desc1       10         2
1  2017.02     222   desc2       15         3
2  2017.02     333   desc3        4         1
3  2017.03     111   desc1       25         5

Blanket `sum`

只需调用 sum 无需任何列名.这处理数字列.对于UPC_DSC，您需要单独处理.

Blanket `sum`

Just call sum without any column names. This handles the numeric columns. For UPC_DSC, you'll need to handle it separately.

g = df.groupby(['month', 'UPC_ID'])
i = g.sum()
j = g[['UPC_DSC']].first()

pd.concat([i, j], 1).reset_index()

     month  UPC_ID  QTY_SOLD  NET_AMT UPC_DSC
0  2017.02     111         2       10   desc1
1  2017.02     222         3       15   desc2
2  2017.02     333         1        4   desc3
3  2017.03     111         5       25   desc1

这篇关于Pandas groupby 和聚合输出应包括所有原始列(包括未聚合的列)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pandas groupby 和聚合输出应包括所有原始列(包括未聚合的列) [英] Pandas groupby and aggregation output should include all the original columns (including the ones not aggregated on)

问题描述

推荐答案

`agg` with a `dict` of functions

`agg` with a `dict` of functions

Blanket `sum`

Blanket `sum`

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas groupby 和聚合输出应包括所有原始列(包括未聚合的列) [英] Pandas groupby and aggregation output should include all the original columns (including the ones not aggregated on)

问题描述

推荐答案

agg with a dict of functions

agg with a dict of functions

Blanket sum

Blanket sum

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

`agg` with a `dict` of functions

`agg` with a `dict` of functions

Blanket `sum`

Blanket `sum`

登录关闭