python pandas根据其他列中的条件进行新列分类 [英] python pandas new column categorization based on conditions in other columns
本文介绍了python pandas根据其他列中的条件进行新列分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用以下python pandas数据框df:
Working with the following python pandas dataframe df:
df = pd.DataFrame({'transaction_id': ['A123','A123','B345','B345','C567','C567','D678','D678'],
'product_id': [255472, 251235, 253764,257344,221577,209809,223551,290678],
'product_category': ['X','X','Y','Y','X','Y','Y','X']})
transaction_id | product_id | product_category
A123 255472 X
A123 251235 X
B345 253764 Y
B345 257344 Y
C567 221577 X
C567 209809 Y
D678 223551 Y
D678 290678 X
我需要添加另一列 transaction_category, transaction_id以及transaction_id中包含哪些产品类别。
这是我正在寻找的输出:
I need to add another column "transaction_category", which looks at the transaction_id and which product categories are in the transaction_id. This is the output I am looking for:
transaction_id | product_id | product_category | transaction_id
123 255472 X X only
123 251235 X X only
345 253764 Y Y only
345 257344 Y Y only
567 221577 X X & Y
567 209809 Y X & Y
678 223551 Y X & Y
678 290678 X X & Y
请注意,我的数据框中还有其他未使用的列,所以我想需要从grouby开始吗?
Please note that I have other columns in my dataframe that I am not using, so I guess I need to start with a grouby?
df2 = df.groupby(['transaction_id','product_category']).reset_index()
推荐答案
IIUC通过使用 transform
和 join
IIUC by using transform
and join
df.groupby('transaction_id').product_category.transform(lambda x : '&'.join(set(x)))
Out[468]:
0 X
1 X
2 Y
3 Y
4 X&Y
5 X&Y
6 X&Y
7 X&Y
Name: product_category, dtype: object
从scott匹配中,您预期的结果是:
From scott match your expected out put :
df['transaction_category']=df.groupby('transaction_id')['product_category'].transform(lambda x: x + ' only' if len(set(x)) < 2 else ' & '.join(set(x)))
df
Out[479]:
product_category product_id transaction_id transaction_category
0 X 255472 A123 X only
1 X 251235 A123 X only
2 Y 253764 B345 Y only
3 Y 257344 B345 Y only
4 X 221577 C567 X & Y
5 Y 209809 C567 X & Y
6 Y 223551 D678 X & Y
7 X 290678 D678 X & Y
这篇关于python pandas根据其他列中的条件进行新列分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文