将长表转换为宽表并根据行创建列 [英] Converting long table to wide and creating columns according to the rows
本文介绍了将长表转换为宽表并根据行创建列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个看起来像这样的数据框:
I have a data frame that look like this:
Customer_ID Category Products
1 Veg A
2 Veg B
3 Fruit A
3 Fruit B
3 Veg B
1 Fruit A
3 Veg C
1 Fruit C
我想找出购买了产品的每个类别的每个客户ID,并相应地为每个产品创建一列.输出看起来像这样:
I want to find out the for each customer ID for each category which products were bought, and create a column for each product accordingly. The output would look like this:
Customer_ID Category Pro_1 Pro_2 Pro_3
1 Veg A NA NA
1 Fruit A NA C
2 Veg NA B NA
3 Veg NA B C
3 Fruit A B NA
推荐答案
使用 groupby
与
Use groupby
with unstack
, but if duplicates rows data are concanecate together:
df = df.groupby(['Customer_ID','Category','Products'])['Products'].sum().unstack()
df.columns = ['Pro_{}'.format(x) for x in range(1, len(df.columns)+1)]
df = df.reset_index()
print (df)
Customer_ID Category Pro_1 Pro_2 Pro_3
0 1 Fruit A None C
1 1 Veg A None None
2 2 Veg None B None
3 3 Fruit A B None
4 3 Veg None B C
另一种带有辅助列的解决方案,三元组必须是唯一的:
Another solution with helper column, triples has to be unique:
#if not unique triples remove duplicates
df = df.drop_duplicates(['Customer_ID','Category','Products'])
df['a'] = df['Products']
df = df.set_index(['Customer_ID','Category','Products'])['a'].unstack()
df.columns = ['Pro_{}'.format(x) for x in range(1, len(df.columns)+1)]
df = df.reset_index()
print (df)
Customer_ID Category Pro_1 Pro_2 Pro_3
0 1 Fruit A None C
1 1 Veg A None None
2 2 Veg None B None
3 3 Fruit A B None
4 3 Veg None B C
这篇关于将长表转换为宽表并根据行创建列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文