Python Pandas groupby forloop& Idxmax [英] Python Pandas groupby forloop & Idxmax
问题描述
我有一个DataFrame,必须在三个级别上分组,然后返回最高的值。每一天都有一个独特的价值回报,我想找到最高的回报和细节。
pre $ 数据。 groupby(['Company','Product','Industry'])['ROI'] .idxmax()
回报显示:
目标 - 盘肥皂 - House在9/17 $有5%ROI b $ b百思买 - CD - 电子产品在9/3
<3> 。
以下是一些示例数据:
+ ---- ------ + ----------- + ------------- + --------- + ----- +
|行业|产品|行业|日期| ROI |
+ ---------- + ----------- + ------------- + -------- - + ----- +
|目标|盘肥皂|房子| 9/17/13 | 5%|
|目标|盘肥皂|房子| 9/16/13 | 2%|
| BestBuy | CD |电子| 9/1/13 | 1%|
| BestBuy | CD | Electroincs | 9/3/13 | 3%|
| ...
不知道这是for循环还是使用.ix。
我认为,如果我理解正确,可以使用 groupby
和
idxmax()
,然后使用从
: df
loc
idx = data.groupby(['Company','Product','Industry '])['ROI'] .idxmax()
data.loc [idx]
另一种选择是使用 reindex
:
data.reindex(idx )
在一个(不同的)数据框中,我碰巧得到了方便,它显示为
在[39]中:%timeit df.reindex(idx )
10000循环,最好是3:每个循环121美元
在[40]中:%timeit df.loc [idx]
10000循环,最好是3:147 us per loop
I have a DataFrame that must be grouped on three levels, and would then have the highest value returned. Each day there is a return for each unique value, and I would like to find the highest return and the details.
data.groupby(['Company','Product','Industry'])['ROI'].idxmax()
The return would show that:
Target - Dish Soap - House had a 5% ROI on 9/17
Best Buy - CDs - Electronics had a 3% ROI on 9/3
was the highest.
Here's some example data:
+----------+-----------+-------------+---------+-----+
| Industry | Product | Industry | Date | ROI |
+----------+-----------+-------------+---------+-----+
| Target | Dish Soap | House | 9/17/13 | 5% |
| Target | Dish Soap | House | 9/16/13 | 2% |
| BestBuy | CDs | Electronics | 9/1/13 | 1% |
| BestBuy | CDs | Electroincs | 9/3/13 | 3% |
| ...
Not sure if this would be a for loop, or using .ix.
I think, if I understand you correctly, you could collect the index values in a Series using groupby
and idxmax()
, and then select those rows from df
using loc
:
idx = data.groupby(['Company','Product','Industry'])['ROI'].idxmax()
data.loc[idx]
another option is to use reindex
:
data.reindex(idx)
On a (different) dataframe I happened to have handy, it appears reindex
might be the faster option:
In [39]: %timeit df.reindex(idx)
10000 loops, best of 3: 121 us per loop
In [40]: %timeit df.loc[idx]
10000 loops, best of 3: 147 us per loop
这篇关于Python Pandas groupby forloop& Idxmax的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!