pandas :如何将列表转换为按列分组的矩阵? [英] Pandas: how to convert a list into a matrix grouped by a column?
问题描述
我有一个pandas数据框,其中第一列(CUSTOMER)是客户的名称,并且客户的名称对于客户购买的每种产品(PRODUCT)重复一次:
I have a pandas dataframe where the first column (CUSTOMER) is the name of the customer and the customer's name is repeated once for every product the customer has purchased (PRODUCT):
Customer Product Count
John A 1
John B 1
John C 1
Mary A 1
Mary B 1
Charles A 1
我想透视此数据以创建一个新的数据框,其中行和列都是产品类别(PRODUCT),值是客户名称的数量,如下所示:
I want to pivot this data to create a new dataframe where both rows and columns are the category of product (PRODUCT) and the values are the count of the customer name, as follows:
Product
A B C
A 0 2 1
B 2 0 1
C 1 1 0
因此,如果约翰购买了A并同时购买了B,则+1将被添加到A:B单元格中,他也同时购买了A和C,因此A:C单元格上有一个+1,依此类推.请注意,Charles不会出现在此数据框中,因为他只购买了一种产品.
So if John bought A and also bought B, +1 will be added to the A:B cell, he also bought A in combination with C, so there is a +1 on the A:C cell, and so on. Note that Charles does not appear in this dataframe because he only bought one product.
我尝试使用pandas.pivot_table,但这是我得到的:
I tried to use pandas.pivot_table but this is what I got:
df = pd.pivot_table(df, index=['Product'], columns=['Product'], values=['Customer'])
>> KeyError: 'Level Product not found'
我应该使用什么方法和参数?
What method and parameters should I use?
推荐答案
带有crosstab
d1 = df.merge(df, on='Customer').query('Product_x != Product_y')
pd.crosstab(d1.Product_x, d1.Product_y)
Product_y A B C
Product_x
A 0 2 1
B 2 0 1
C 1 1 0
您可以查看此答案,以更好地了解如何加快crosstab
的速度.该问题的关键见解是自我合并.
You can see this answer to get a better idea how to speed the crosstab
up. The key insight for this problem was the self merging.
这篇关于 pandas :如何将列表转换为按列分组的矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!