Pandas groupby,在另一列中标识出最大值的元素 [英] Pandas groupby with identification of an element with max value in another column

查看:117
本文介绍了Pandas groupby,在另一列中标识出最大值的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含具有不同定价规则的商品的销售结果:

I have a dataframe with sales results of items with different pricing rules:

import pandas as pd
from datetime import timedelta
df_1 = pd.DataFrame()
df_2 = pd.DataFrame()
df_3 = pd.DataFrame()

# Create datetimes and data
df_1['item'] = [1, 1, 2, 2, 2]
df_1['date'] = pd.date_range('1/1/2018', periods=5, freq='D')
df_1['price_rule'] = ['a', 'b', 'a', 'b', 'b']
df_1['sales']= [2, 4, 1, 5, 7]
df_1['clicks']= [7, 8, 9, 10, 11]

df_2['item'] = [1, 1, 2, 2, 2]
df_2['date'] = pd.date_range('1/1/2018', periods=5, freq='D')
df_2['price_rule'] = ['b', 'b', 'a', 'a', 'a']
df_2['sales']= [2, 3, 4, 5, 6]
df_2['clicks']= [7, 8, 9, 10, 11]

df_3['item'] = [1, 1, 2, 2, 2]
df_3['date'] = pd.date_range('1/1/2018', periods=5, freq='D')
df_3['price_rule'] = ['b', 'a', 'b', 'a', 'b']
df_3['sales']= [6, 5, 4, 5, 6]
df_3['clicks']= [7, 8, 9, 10, 11]

df = pd.concat([df_1, df_2, df_3])
df = df.sort_values(['item', 'date'])
df.reset_index(drop=True)
df

结果为:

    item    date    price_rule  sales   clicks
0   1   2018-01-01       a       2       7
0   1   2018-01-01       b       2       7
0   1   2018-01-01       b       6       7
1   1   2018-01-02       b       4       8
1   1   2018-01-02       b       3       8
1   1   2018-01-02       a       5       8
2   2   2018-01-03       a       1       9
2   2   2018-01-03       a       4       9
2   2   2018-01-03       b       4       9
3   2   2018-01-04       b       5       10
3   2   2018-01-04       a       5       10
3   2   2018-01-04       a       5       10
4   2   2018-01-05       b       7       11
4   2   2018-01-05       a       6       11
4   2   2018-01-05       b       6       11

我的目标是:
1.按天对所有项目进行分组(以获取给定日期的每个项目的一行)
2.用"sum"汇总点击次数"
3. 生成"winning_pricing_rule"列,如下所示:
-对于给定的项目和给定的日期,采用具有最高销售"价值的定价规则 -如果是绘制"(例如,参见上面示例中2018年1月3日的项目2):仅选择其中之一(这在我的数据集中很少见,因此可以是随机的...)

My goal is to:
1. group all items by day (to get a single row for each item and given day)
2. aggregate 'clicks' with "sum"
3. generate a "winning_pricing_rule" columns as following:
- for a given item and given date, take a pricing rule with the highest 'sales' value - in case of 'draw' (see eg: item 2 on 2018-01-03 in a sample above): choose just one of them (that's rare in my dataset, so it can be random...)

我想象结果看起来像这样:

I imagine the result to look like this:

  item  date       winning_price_rule   clicks
0   1   2018-01-01      b               21
1   1   2018-01-02      a               24
2   2   2018-01-03      b               27  <<remark: could also be a (due to draw)
3   2   2018-01-04      a               30  <<remark: could also be b (due to draw)
4   2   2018-01-05      b               33

我尝试过:

a.groupby(['item', 'date'], as_index = False).agg({'sales':'sum','revenue':'max'})

但未能确定获胜的定价规则.

but failed to identify a winning pricing rule.

有什么想法吗?非常感谢您的帮助:)

Any ideas? Many Thanks for help :)

安迪

推荐答案

首先将列price_rule转换为

First convert column price_rule to index by DataFrame.set_index, so for winning_price_rule is possible use DataFrameGroupBy.idxmax - get index value by maximum sales in GroupBy.agg, because also is necessary aggregate sum:

df1 = (df.set_index('price_rule')
         .groupby(['item', 'date'])
         .agg({'sales':'idxmax', 'clicks':'sum'})
         .reset_index())

对于熊猫0.25.+可以使用:

For pandas 0.25.+ is possible use:

df1 = (df.set_index('price_rule')
         .groupby(['item', 'date'])
         .agg(winning_pricing_rule=pd.NamedAgg(column='sales', aggfunc='idxmax'),clicks=pd.NamedAgg(column='clicks', aggfunc="sum'))
         .reset_index())

这篇关于Pandas groupby,在另一列中标识出最大值的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆