查找数据帧中每行的前N列 [英] Finding top N columns for each row in data frame

查看:163
本文介绍了查找数据帧中每行的前N列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个包含一个描述性列和X个数字列的数据框,对于每一行,我想标识具有较高值的​​前N列,并将其另存为新的数据框上的行。



例如,考虑以下数据框架:

  df = pd.DataFrame()
df ['index'] = ['A','B','C','D','E','F']
df ['option1'] = [1,5, 3,7,9,3]
df ['option2'] = [8,4,5,6,9,2]
df ['option3'] = [9,9,1, 3,9,5]
df ['option4'] = [3,8,3,5,7,0]
df ['option5'] = [2,3,4,9, 4,2]



我想输出(让我们说N是3,所以我想要前3名):

 code> A,option3 
A,option2
A,option4

B,option3
B,option4
B,option1

C,option2
C,option5
C,option4(或选项1 - 关系真的不是一个问题)

D,option5
D,option1
D,option2

等等....

任何想法如何轻松实现?
谢谢

解决方案

如果你只是想配对:

 从运算符import itemgetter,因为
来自itertools import repeat
n = 3

#sort_values = order pandas< 0.17
new_d =(zip(repeat(row [index]),map(it(0),(row [1:]。sort_values(ascending = 0)[:n] .iteritems())) )
for _,row in df.iterrows())
for new row in row
print(list(row))

输出:

  [('B','option3' ),('B','option4'),('B','option1')] 
[('C','option2'),('C','option5'),(' ','option1')]
[('D','option5'),('D','option1'),('D','option2')]
[('E ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' ),('F','option2')]

哪个还维护订单。 >

如果您想要列表列表:

 从运算符导入itemgetter,因为它
from itertools import repeat
n = 3

new_d = [list(zip(repeat(row [index]),map(it(0),(row [1 :]。sort_values(asc结束= 0)[:n] .iteritems()))))
for _,row in df.iterrows()]

输出:

  [[('A','option3'),(' A','option2'),('A','option4')],
[('B','option3'),('B','option4'),('B'选项1')],
[('C','option2'),('C','option5'),('C','option1')],
[ ,'option5'),('D','option1'),('D','option2')],
[('E','option1' ),('E','option3')],
[('F','option3'),('F','option1'),('F','option2')]]

或使用pythons排序:

  new_d = [list(zip(repeat(row [index]),map(it(0),sorted(row [1:])iteritems(),key = it(1) reverse = 1)[:n])))
for _,row in df.iterrows()]

其实是最快的,如果你真的想要字符串,格式化输出是非常简单的,但是你想要的。


given a data frame with one descriptive column and X numeric columns, for each row I'd like to identify the top N columns with the higher values and save it as rows on a new dataframe.

For example, consider the following data frame:

df = pd.DataFrame()
df['index'] = ['A', 'B', 'C', 'D','E', 'F']
df['option1'] = [1,5,3,7,9,3]
df['option2'] = [8,4,5,6,9,2]
df['option3'] = [9,9,1,3,9,5]
df['option4'] = [3,8,3,5,7,0]
df['option5'] = [2,3,4,9,4,2]

I'd like to output (lets say N is 3, so I want the top 3):

A,option3
A,option2
A,option4

B,option3
B,option4
B,option1

C,option2
C,option5
C,option4 (or option1 - ties arent really a problem)

D,option5
D,option1
D,option2

and so on....

any idea how that can be easily achieved? Thanks

解决方案

If you just want pairings:

from operator import itemgetter as it
from itertools import repeat
n = 3

 # sort_values = order pandas < 0.17
new_d = (zip(repeat(row["index"]), map(it(0),(row[1:].sort_values(ascending=0)[:n].iteritems())))
                 for _, row in df.iterrows())
for row in new_d:
    print(list(row))

Output:

[('B', 'option3'), ('B', 'option4'), ('B', 'option1')]
[('C', 'option2'), ('C', 'option5'), ('C', 'option1')]
[('D', 'option5'), ('D', 'option1'), ('D', 'option2')]
[('E', 'option1'), ('E', 'option2'), ('E', 'option3')]
[('F', 'option3'), ('F', 'option1'), ('F', 'option2')]

Which also maintains the order.

If you want a list of lists:

from operator import itemgetter as it
from itertools import repeat
n = 3

new_d = [list(zip(repeat(row["index"]), map(it(0),(row[1:].sort_values(ascending=0)[:n].iteritems()))))
                 for _, row in df.iterrows()]

Output:

[[('A', 'option3'), ('A', 'option2'), ('A', 'option4')],
[('B', 'option3'), ('B', 'option4'), ('B', 'option1')], 
[('C', 'option2'), ('C', 'option5'), ('C', 'option1')], 
[('D', 'option5'), ('D', 'option1'), ('D', 'option2')], 
[('E', 'option1'), ('E', 'option2'), ('E', 'option3')],
[('F', 'option3'), ('F', 'option1'), ('F', 'option2')]]

Or using pythons sorted:

new_d = [list(zip(repeat(row["index"]), map(it(0), sorted(row[1:].iteritems(), key=it(1) ,reverse=1)[:n])))
                     for _, row in df.iterrows()]

Which is actually the fastest, if you really want strings, it is pretty trivial to format the output however you want.

这篇关于查找数据帧中每行的前N列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆