Python Pandas groupby forloop& Idxmax [英] Python Pandas groupby forloop & Idxmax

查看:444
本文介绍了Python Pandas groupby forloop& Idxmax的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame,必须在三个级别上分组,然后返回最高的值。每一天都有一个独特的价值回报,我想找到最高的回报和细节。

pre $ 数据。 groupby(['Company','Product','Industry'])['ROI'] .idxmax()

回报显示:

 目标 - 盘肥皂 -  House在9/17 $有5%ROI b $ b百思买 -  CD  - 电子产品在9/3 



<3> 。



以下是一些示例数据:

  + ---- ------ + ----------- + ------------- + --------- + ----- + 
|行业|产品|行业|日期| ROI |
+ ---------- + ----------- + ------------- + -------- - + ----- +
|目标|盘肥皂|房子| 9/17/13 | 5%|
|目标|盘肥皂|房子| 9/16/13 | 2%|
| BestBuy | CD |电子| 9/1/13 | 1%|
| BestBuy | CD | Electroincs | 9/3/13 | 3%|
| ...

不知道这是for循环还是使用.ix。

解决方案

我认为,如果我理解正确,可以使用 groupby idxmax(),然后使用 df loc

  idx = data.groupby(['Company','Product','Industry '])['ROI'] .idxmax()
data.loc [idx]

另一种选择是使用 reindex

  data.reindex(idx )

在一个(不同的)数据框中,我碰巧得到了方便,它显示为

 在[39]中:%timeit df.reindex(idx )
10000循环,最好是3:每个循环121美元

在[40]中:%timeit df.loc [idx]
10000循环,最好是3:147 us per loop


I have a DataFrame that must be grouped on three levels, and would then have the highest value returned. Each day there is a return for each unique value, and I would like to find the highest return and the details.

data.groupby(['Company','Product','Industry'])['ROI'].idxmax()

The return would show that:

Target   - Dish Soap - House       had a 5% ROI on 9/17
Best Buy - CDs       - Electronics had a 3% ROI on 9/3

was the highest.

Here's some example data:

+----------+-----------+-------------+---------+-----+
| Industry | Product   | Industry    | Date    | ROI |
+----------+-----------+-------------+---------+-----+
| Target   | Dish Soap | House       | 9/17/13 | 5%  |
| Target   | Dish Soap | House       | 9/16/13 | 2%  |
| BestBuy  | CDs       | Electronics | 9/1/13  | 1%  |
| BestBuy  | CDs       | Electroincs | 9/3/13  | 3%  |
| ...

Not sure if this would be a for loop, or using .ix.

解决方案

I think, if I understand you correctly, you could collect the index values in a Series using groupby and idxmax(), and then select those rows from df using loc:

idx =  data.groupby(['Company','Product','Industry'])['ROI'].idxmax()
data.loc[idx]

another option is to use reindex:

data.reindex(idx)

On a (different) dataframe I happened to have handy, it appears reindex might be the faster option:

In [39]: %timeit df.reindex(idx)
10000 loops, best of 3: 121 us per loop

In [40]: %timeit df.loc[idx]
10000 loops, best of 3: 147 us per loop

这篇关于Python Pandas groupby forloop&amp; Idxmax的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆