返回表示每个组中最大值的索引的一系列数字位置 [英] return a series of numeric positions for the indices representing the max within each group

查看:66
本文介绍了返回表示每个组中最大值的索引的一系列数字位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑系列:

np.random.seed([3,1415])
s = pd.Series(np.random.rand(100),
              pd.MultiIndex.from_product([list('ABDCE'),
                                          list('abcde'),
                                          ['One', 'Two', 'Three', 'Four']]))

我可以groupby索引级别的组合并获得idxmax:

I can groupby combinations of index levels and get the idxmax:

s.groupby(level=[0, 2]).idxmax()

A  Four      (A, c, Four)
   One        (A, d, One)
   Three    (A, c, Three)
   Two        (A, d, Two)
B  Four      (B, d, Four)
   One        (B, d, One)
   Three    (B, c, Three)
   Two        (B, b, Two)
C  Four      (C, b, Four)
   One        (C, a, One)
   Three    (C, a, Three)
   Two        (C, e, Two)
D  Four      (D, b, Four)
   One        (D, e, One)
   Three    (D, b, Three)
   Two        (D, c, Two)
E  Four      (E, c, Four)
   One        (E, a, One)
   Three    (E, c, Three)
   Two        (E, a, Two)
dtype: object

我希望每个组中的每个数字的位置.

I want the numeric position of each of these within each group.

我可以通过此问题的真棒答案

s.groupby(level=[0, 2]).idxmax().apply(lambda x: s.index.get_loc(x))

A  Four     11
   One      12
   Three    10
   Two      13
B  Four     35
   One      32
   Three    30
   Two      25
C  Four     67
   One      60
   Three    62
   Two      77
D  Four     47
   One      56
   Three    46
   Two      49
E  Four     91
   One      80
   Three    90
   Two      81
dtype: int64

但是我想要这个:

A  Four     2
   One      3
   Three    2
   Two      3
B  Four     3
   One      3
   Three    2
   Two      1
C  Four     1
   One      0
   Three    0
   Two      4
D  Four     1
   One      4
   Three    1
   Two      2
E  Four     2
   One      0
   Three    2
   Two      0
dtype: int64

推荐答案

我终于有了一个解决方案,该解决方案使用NumPy的重塑方法,然后沿其中一个轴进行操作,从而为我们提供argmax.我不确定这是否优雅,但我希望在性能方面会很好.另外,我假设用于多索引数据的pandas系列具有常规格式,即每个级别维护所有索引中的元素数量.

Well I finally have a solution, which uses NumPy's reshaping method and then operates along one of the axes to give us argmax. I am not sure if this is elegant, but I am hoping would be good in terms of performance. Also, I am assuming that pandas Series for multi-index data has a regular format, i.e. each level maintains the number of elements across all indices.

这是实现-

L0,L1,L2 = s.index.levels[:3]
IDs = s.sortlevel().values.reshape(-1,len(L0),len(L1),len(L2)).argmax(2)
sOut = pd.Series(IDs.ravel(),pd.MultiIndex.from_product([L0,L2]))


计时(pir的补语)


Timing (complements of pir)

这篇关于返回表示每个组中最大值的索引的一系列数字位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆