根据 Pandas 中的条件将列名列表作为新列返回 [英] Return a list of column names as new column based on a condition in pandas

查看:160
本文介绍了根据 Pandas 中的条件将列名列表作为新列返回的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的每个客户和产品的数据如下所示:

My data for each customer and product looks like below :

Customer  P1   P2   P3   P4   P5   P6 
c1        10   2    43   21   11   4 
c2        1    3    32   1    6    3  
c3        20   4    20   72   78   80
c4        30   80   31   31   29   20

我希望输出如下:

Customer  P1   P2   P3   P4   P5   P6   Top_Products (based on scores)
c1        10   2    43   21   11   4    [P3,P4,P5]
c2        1    3    32   1    6    3    [P3,P5,P2]
c3        20   4    20   72   78   80   [P6,P5,P4]
c4        30   80   31   31   29   20   [P2,P3,P4]

输出说明:我正在对每个客户的产品分数进行水平排序,并取前3个分数的列名称(降序)并将列表作为新列作为每个客户的Top products".

Explanation of output : I am sorting the scores of products horizontally for each customer, and taking the column names of top 3 scores(descending order ) and putting in a list as a new column as "Top products" for each customer.

例如.对于第一行,p3、p4 和 p5 的分数最高(按最佳分数排序)并作为列表放在另一列中

Eg. for 1st row, p3, p4 and p5 have highest scores(sorted in the best scores) and are put in another column as list

推荐答案

首先使用 iloc 获取所有 P 列,并通过 numpy.argsort,使用索引和最后将值转换为列表:

First get all P columns with iloc and get positions of sorted values by numpy.argsort, use indexing and last convert values to lists:

df1 = df.iloc[:, 1:]

df['Top_Products'] = df1.columns.values[np.argsort(df1.to_numpy(), axis=1)[:, :3]].tolist()
print (df)
  Customer  P1  P2  P3  P4  P5  P6  Top_Products
0       c1  10   2  43  21  11   4  [P2, P6, P1]
1       c2   1   3  32   1   6   3  [P1, P4, P2]
2       c3  20   4  20  72  78  80  [P2, P1, P3]
3       c4  30  80  31  31  29  20  [P6, P5, P1]

如果性能不重要或行数较少,请使用 Series.nsmallest 将索引转换为列表:

If performance is not important or small number of rows use Series.nsmallest with convert index to lists:

df['Top_Products'] = df1.apply(lambda x: x.nsmallest(3).index.tolist(), axis=1)
print (df)
  Customer  P1  P2  P3  P4  P5  P6  Top_Products
0       c1  10   2  43  21  11   4  [P2, P6, P1]
1       c2   1   3  32   1   6   3  [P1, P4, P2]
2       c3  20   4  20  72  78  80  [P2, P1, P3]
3       c4  30  80  31  31  29  20  [P6, P5, P1]

对于最高分的前3个值的答案非常相似,只需为-df1.to_numpy()添加-:

For top3 values by highest scores is answer very similar, only add - for -df1.to_numpy():

df1 = df.iloc[:, 1:]

df['Top_Products'] = df1.columns.values[np.argsort(-df1.to_numpy(), axis=1)[:, :3]].tolist()
print (df)
  Customer  P1  P2  P3  P4  P5  P6  Top_Products
0       c1  10   2  43  21  11   4  [P3, P4, P5]
1       c2   1   3  32   1   6   3  [P3, P5, P2]
2       c3  20   4  20  72  78  80  [P6, P5, P4]
3       c4  30  80  31  31  29  20  [P2, P3, P4] 

这篇关于根据 Pandas 中的条件将列名列表作为新列返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆