pandas 第二大价值的列名 [英] Pandas second largest value's column name
问题描述
我正在尝试查找与DataFrame中最大和最大值相关联的列名称,这里是一个简化示例(真正的列有500列):
I am trying to find column name associated with the largest and second largest values in a DataFrame, here's a simplified example (the real one has over 500 columns):
Date val1 val2 val3 val4
1990 5 7 1 10
1991 2 1 10 3
1992 10 9 6 1
1993 50 10 2 15
1994 1 15 7 8
需要成为:
Date 1larg 2larg
1990 val4 val2
1991 val3 val4
1992 val1 val2
1993 val1 val4
1994 val2 val4
我可以找到最大值的列名(i,e,1larg以上),但是如何找到第二大?
I can find the column name with the largest value (i,e, 1larg above) with idxmax, but how can I find the second largest?
推荐答案
(您没有任何重复的最大值你的行,所以我猜猜,如果你有 [1,1,2,2]
你想要 val3
和 val4
。)
(You don't have any duplicate maximum values in your rows, so I'll guess that if you have [1,1,2,2]
you want val3
and val4
to be selected.)
一种方法是将 argsort
的结果用作具有列名称的系列的索引。
One way would be to use the result of argsort
as an index into a Series with the column names.
df = df.set_index("Date")
arank = df.apply(np.argsort, axis=1)
ranked_cols = df.columns.to_series()[arank.values[:,::-1][:,:2]]
new_frame = pd.DataFrame(ranked_cols, index=df.index)
生成
0 1
Date
1990 val4 val2
1991 val3 val4
1992 val1 val2
1993 val1 val4
1994 val2 val4
1995 val4 val3
(我添加了一个额外的1995 [1,1,2,2]
row。)
(where I've added an extra 1995 [1,1,2,2]
row.)
或者,您可以将融入
为平面格式,选出最大的两个值在每个日期组中,然后再次将其重新设置。
Alternatively, you could probably melt
into a flat format, pick out the largest two values in each Date group, and then turn it again.
这篇关于 pandas 第二大价值的列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!