pandas 每行前三的值 [英] Top 3 Values Per Row in Pandas

查看:47
本文介绍了 pandas 每行前三的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个符合以下条件的大型Pandas数据框:

I have a large Pandas dataframe that is in the vein of:

| ID | Var1 | Var2 | Var3 | Var4 | Var5 |
|----|------|------|------|------|------|
| 1  | 1    | 2    | 3    | 4    | 5    |
| 2  | 10   | 9    | 8    | 7    | 6    |
| 3  | 25   | 37   | 41   | 24   | 21   |
| 4  | 102  | 11   | 72   | 56   | 151  |
...

,我想生成看起来像这样的输出,每行3个最高值的列名:

and I would like to generate output that looks like this, taking the column names of the 3 highest values for each row:

| ID | 1st Max | 2nd Max | 3rd Max |
|----|---------|---------|---------|
| 1  | Var5    | Var4    | Var3    |
| 2  | Var1    | Var2    | Var3    |
| 3  | Var3    | Var2    | Var1    |
| 4  | Var5    | Var1    | Var3    |
...

我尝试使用df.idmax(axis = 1)返回第一个最大列名,但不确定如何计算其他两个?

I have tried using df.idmax(axis=1) which returns the 1st maximum column name but am unsure how to compute the other two?

任何对此的帮助将不胜感激,谢谢!

Any help on this would be truly appreciated, thanks!

推荐答案

使用 numpy.argsort 用于选择 top3 通过索引,最后将其传递给 DataFrame 构造函数:

Use numpy.argsort for positions of sorted values with select top3 by indexing, last pass it to DataFrame constructor:

df = df.set_index('ID')
df = pd.DataFrame(df.columns.values[np.argsort(-df.values, axis=1)[:, :3]], 
                  index=df.index,
                  columns = ['1st Max','2nd Max','3rd Max']).reset_index()
print (df)
   ID 1st Max 2nd Max 3rd Max
0   1    Var5    Var4    Var3
1   2    Var1    Var2    Var3
2   3    Var3    Var2    Var1
3   4    Var5    Var1    Var3

或者如果性能不重要,请使用 最大 ,每行应用

Or if performance is not important use nlargest with apply per each row:

c = ['1st Max','2nd Max','3rd Max']
df = (df.set_index('ID')
        .apply(lambda x: pd.Series(x.nlargest(3).index, index=c), axis=1)
        .reset_index())

这篇关于 pandas 每行前三的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆