Pandas 在数据框中的两列中查找交叉销售 [英] Pandas Finding cross sell in two columns in a data frame

查看:49
本文介绍了Pandas 在数据框中的两列中查找交叉销售的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想做的是一种交叉销售.

What I'm trying to do is a kind of a cross sell.

我有一个包含两列的 Pandas 数据框,一列包含收据编号,另一列包含产品 ID:

I have a Pandas dataframe with two columns, one with receipt numbers, and the other with product ids:

receipt  product
1        a
1        b
2        c
3        b
3        a

大部分收据都有很多产品.我需要找到收据中发生的产品组合的计数.假设产品a"和b"是最常见的组合(它们一起出现在大多数收据中),我如何找到这些信息?

Most of the receipts have many products. What I need to find is the count of combinations of products that happen in the receipts. Let's say products 'a' and 'b' are the most common combination (they appear together in most of the receipts), how do I find this information?

我尝试使用 df.groupby(['receipt','product']).count() 但这只会给我带来收据 + 产品的组合计数,而不是关系计数每个收据的产品数量.

I tried using df.groupby(['receipt','product']).count() but this only brings me the count of combinations for receipt + product, not the count of relation of products per receipt.

感谢任何帮助,谢谢!

推荐答案

我认为你可以进行交叉合并:

I think you can do a cross merge:

new_df = df.merge(df, on='receipt')
(new_df[new_df['product_x'] < new_df['product_y']]
     .groupby(['product_x','product_y'])['receipt'].count()
)

输出:

product_x  product_y
a          b            2
Name: receipt, dtype: int64

这篇关于Pandas 在数据框中的两列中查找交叉销售的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆