获取每行三个最小值并返回对应的列名称 [英] Getting the three smallest values per row and returning the correspondent column names

查看:616
本文介绍了获取每行三个最小值并返回对应的列名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据帧,df和df2,它们是通讯员。
现在基于第一个数据帧df,我希望在一行中获得3个最小值并返回对应列的名称(在这种情况下,如X或Y或Z或T) 。所以我可以得到新的数据帧df3。

I have two dataframe, df and df2,they are correspondent. Now based in the first dataframe df, I want to get the 3 smallest value in one row and return the correspondent column's name(in this case like "X"or"Y"or"Z"or"T"). So I can get the new dataframe df3.

df = pd.DataFrame({
        'X': [21, 2, 43, 44, 56, 67, 7, 38, 29, 130],
        'Y': [101, 220, 330, 140, 250, 10, 207, 320, 420, 50],
        'Z': [20, 128, 136, 144, 312, 10, 82, 63, 42, 12],
        'T': [2, 32, 4, 424, 256, 167, 27, 38, 229, 30]
    }, index=list('ABCDEFGHIJ'))

df2 = pd.DataFrame({
        'X': [0.5, 0.12,0.43, 0.424, 0.65,0.867,0.17,0.938,0.229,0.113],
        'Y': [0.1,2.201,0.33,0.140,0.525,0.31,0.20,0.32,0.420,0.650],
        'Z': [0.20,0.128,0.136,0.2144,0.5312,0.61,0.82,0.363,0.542,0.512],
        'T':[0.52, 0.232,0.34, 0.6424, 0.6256,0.3167,0.527,0.38,0.4229,0.73]
    },index=list('ABCDEFGHIJ'))

除此之外,我想得到另一个数据帧df4与df3中的df3相对应,这意味着在df行['A'](2,20,21)中是3个最小值,所以在df4行['A']中,我想得到(0.52, 0.2,0 .5)来自df2。

Besides that, I want to get another dataframe df4 which is correspondent from df3 in df2 which means in df row['A'] (2,20,21) is the 3 smallest value, so in df4 row['A'], I want to get (0.52,0.2,0.5) from df2.

谢谢。

推荐答案

你可以如果 DataFrames 具有相同顺序的相同列名,则使用 argsort

You can use if both DataFrames has same columns names in same order argsort for indices:

arr = df.values.argsort(1)[:,:3]
print (arr)
[[0 3 1]
 [1 0 3]
 [0 1 3]
 [1 2 3]
 [1 2 0]
 [2 3 1]
 [1 0 3]
 [0 1 3]
 [1 3 0]
 [3 0 2]]

#get values by indices in arr 
b = df2.values[np.arange(len(arr))[:,None], arr]
print (b)
[[ 0.52    0.2     0.5   ]
 [ 0.12    0.232   0.128 ]
 [ 0.34    0.43    0.136 ]
 [ 0.424   0.14    0.2144]
 [ 0.65    0.525   0.6256]
 [ 0.31    0.61    0.867 ]
 [ 0.17    0.527   0.82  ]
 [ 0.38    0.938   0.363 ]
 [ 0.229   0.542   0.4229]
 [ 0.512   0.73    0.65  ]]

上次使用 DataFrame 构造函数:

df3 = pd.DataFrame(df.columns[arr])
df3.columns = ['Col{}'.format(x+1) for x in df3.columns]
print (df3)
  Col1 Col2 Col3
0    T    Z    X
1    X    T    Z
2    T    X    Z
3    X    Y    Z
4    X    Y    T
5    Y    Z    X
6    X    T    Z
7    T    X    Z
8    X    Z    T
9    Z    T    Y

df4 = pd.DataFrame(b)
df4.columns = ['Col{}'.format(x+1) for x in df4.columns]
print (df4)
    Col1   Col2    Col3
0  0.520  0.200  0.5000
1  0.120  0.232  0.1280
2  0.340  0.430  0.1360
3  0.424  0.140  0.2144
4  0.650  0.525  0.6256
5  0.310  0.610  0.8670
6  0.170  0.527  0.8200
7  0.380  0.938  0.3630
8  0.229  0.542  0.4229
9  0.512  0.730  0.6500

答案类似,所以我创建时间

Answers are similar, so I create timings:

np.random.seed(14)
N = 1000000
df1 = pd.DataFrame(np.random.randint(100, size=(N, 4)), columns=['X','Y','Z','T'])
#print (df1)

df1 = pd.DataFrame(np.random.rand(N, 4), columns=['X','Y','Z','T'])
#print (df1)


def jez():
    arr = df.values.argsort(1)[:,:3]
    b = df2.values[np.arange(len(arr))[:,None], arr]
    df3 = pd.DataFrame(df.columns[arr])
    df3.columns = ['Col{}'.format(x+1) for x in df3.columns]
    df4 = pd.DataFrame(b)
    df4.columns = ['Col{}'.format(x+1) for x in df4.columns]


def pir():
    v = df.values
    a = v.argpartition(3, 1)[:, :3]
    c = df.columns.values[a]
    pd.DataFrame(c, df.index)
    d = df2.values[np.arange(len(df))[:, None], a]
    pd.DataFrame(d, df.index, [1, 2, 3]).add_prefix('Col')

def cᴏʟᴅsᴘᴇᴇᴅ():
    #another solution is wrong
    df3 = df.apply(lambda x: df.columns[np.argsort(x)], 1).iloc[:, :3]
    pd.DataFrame({'Col{}'.format(i + 1) : df2.lookup(df3.index, df3.iloc[:, i]) for i in range(df3.shape[1])}, index=df.index)


print (jez())
print (pir())
print (cᴏʟᴅsᴘᴇᴇᴅ())



< hr>


In [176]: %timeit (jez())
1000 loops, best of 3: 412 µs per loop

In [177]: %timeit (pir())
1000 loops, best of 3: 425 µs per loop

In [178]: %timeit (cᴏʟᴅsᴘᴇᴇᴅ())
100 loops, best of 3: 3.99 ms per loop

这篇关于获取每行三个最小值并返回对应的列名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆