pandas 每行前三的值 [英] Top 3 Values Per Row in Pandas
问题描述
我有一个符合以下条件的大型Pandas数据框:
I have a large Pandas dataframe that is in the vein of:
| ID | Var1 | Var2 | Var3 | Var4 | Var5 |
|----|------|------|------|------|------|
| 1 | 1 | 2 | 3 | 4 | 5 |
| 2 | 10 | 9 | 8 | 7 | 6 |
| 3 | 25 | 37 | 41 | 24 | 21 |
| 4 | 102 | 11 | 72 | 56 | 151 |
...
,我想生成看起来像这样的输出,每行3个最高值的列名:
and I would like to generate output that looks like this, taking the column names of the 3 highest values for each row:
| ID | 1st Max | 2nd Max | 3rd Max |
|----|---------|---------|---------|
| 1 | Var5 | Var4 | Var3 |
| 2 | Var1 | Var2 | Var3 |
| 3 | Var3 | Var2 | Var1 |
| 4 | Var5 | Var1 | Var3 |
...
我尝试使用df.idmax(axis = 1)返回第一个最大列名,但不确定如何计算其他两个?
I have tried using df.idmax(axis=1) which returns the 1st maximum column name but am unsure how to compute the other two?
任何对此的帮助将不胜感激,谢谢!
Any help on this would be truly appreciated, thanks!
推荐答案
使用 numpy.argsort
用于选择 top3 $ c $的排序值的位置c>通过索引,最后将其传递给
DataFrame
构造函数:
Use numpy.argsort
for positions of sorted values with select top3
by indexing, last pass it to DataFrame
constructor:
df = df.set_index('ID')
df = pd.DataFrame(df.columns.values[np.argsort(-df.values, axis=1)[:, :3]],
index=df.index,
columns = ['1st Max','2nd Max','3rd Max']).reset_index()
print (df)
ID 1st Max 2nd Max 3rd Max
0 1 Var5 Var4 Var3
1 2 Var1 Var2 Var3
2 3 Var3 Var2 Var1
3 4 Var5 Var1 Var3
或者如果性能不重要,请使用 最大
,每行应用
:
Or if performance is not important use nlargest
with apply
per each row:
c = ['1st Max','2nd Max','3rd Max']
df = (df.set_index('ID')
.apply(lambda x: pd.Series(x.nlargest(3).index, index=c), axis=1)
.reset_index())
这篇关于 pandas 每行前三的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!