根据 Pandas 中的条件将列名列表作为新列返回 [英] Return a list of column names as new column based on a condition in pandas
问题描述
我的每个客户和产品的数据如下所示:
My data for each customer and product looks like below :
Customer P1 P2 P3 P4 P5 P6
c1 10 2 43 21 11 4
c2 1 3 32 1 6 3
c3 20 4 20 72 78 80
c4 30 80 31 31 29 20
我希望输出如下:
Customer P1 P2 P3 P4 P5 P6 Top_Products (based on scores)
c1 10 2 43 21 11 4 [P3,P4,P5]
c2 1 3 32 1 6 3 [P3,P5,P2]
c3 20 4 20 72 78 80 [P6,P5,P4]
c4 30 80 31 31 29 20 [P2,P3,P4]
输出说明:我正在对每个客户的产品分数进行水平排序,并取前3个分数的列名称(降序)并将列表作为新列作为每个客户的Top products".
Explanation of output : I am sorting the scores of products horizontally for each customer, and taking the column names of top 3 scores(descending order ) and putting in a list as a new column as "Top products" for each customer.
例如.对于第一行,p3、p4 和 p5 的分数最高(按最佳分数排序)并作为列表放在另一列中
Eg. for 1st row, p3, p4 and p5 have highest scores(sorted in the best scores) and are put in another column as list
推荐答案
首先使用 iloc
获取所有 P
列,并通过 numpy.argsort
,使用索引和最后将值转换为列表:
First get all P
columns with iloc
and get positions of sorted values by numpy.argsort
, use indexing and last convert values to lists:
df1 = df.iloc[:, 1:]
df['Top_Products'] = df1.columns.values[np.argsort(df1.to_numpy(), axis=1)[:, :3]].tolist()
print (df)
Customer P1 P2 P3 P4 P5 P6 Top_Products
0 c1 10 2 43 21 11 4 [P2, P6, P1]
1 c2 1 3 32 1 6 3 [P1, P4, P2]
2 c3 20 4 20 72 78 80 [P2, P1, P3]
3 c4 30 80 31 31 29 20 [P6, P5, P1]
如果性能不重要或行数较少,请使用 Series.nsmallest
将索引转换为列表:
If performance is not important or small number of rows use Series.nsmallest
with convert index to lists:
df['Top_Products'] = df1.apply(lambda x: x.nsmallest(3).index.tolist(), axis=1)
print (df)
Customer P1 P2 P3 P4 P5 P6 Top_Products
0 c1 10 2 43 21 11 4 [P2, P6, P1]
1 c2 1 3 32 1 6 3 [P1, P4, P2]
2 c3 20 4 20 72 78 80 [P2, P1, P3]
3 c4 30 80 31 31 29 20 [P6, P5, P1]
对于最高分的前3个值的答案非常相似,只需为-df1.to_numpy()
添加-
:
For top3 values by highest scores is answer very similar, only add -
for -df1.to_numpy()
:
df1 = df.iloc[:, 1:]
df['Top_Products'] = df1.columns.values[np.argsort(-df1.to_numpy(), axis=1)[:, :3]].tolist()
print (df)
Customer P1 P2 P3 P4 P5 P6 Top_Products
0 c1 10 2 43 21 11 4 [P3, P4, P5]
1 c2 1 3 32 1 6 3 [P3, P5, P2]
2 c3 20 4 20 72 78 80 [P6, P5, P4]
3 c4 30 80 31 31 29 20 [P2, P3, P4]
这篇关于根据 Pandas 中的条件将列名列表作为新列返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!