使用具有唯一订单号但重复订单组合的arules包进行R购物篮分析 [英] R Basket Analysis using arules package with unique order number but duplicate order combinations

查看:107
本文介绍了使用具有唯一订单号但重复订单组合的arules包进行R购物篮分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用具有唯一订单号但重复订单组合的arules包进行R购物篮分析

R Basket analysis using arules package with unique order number but duplicate order combinations

仅学习R.我正在尝试使用arules软件包进行购物篮分析(但我完全愿意接受其他任何包装建议!),以比较正在购买的6种不同商品类型的所有可能组合.

Just learning R. I'm trying to do a basket analysis using the arules package (but I'm totally open to any other package suggestions!) to compare all possible combinations of 6 different item types being purchased.

我的原始数据集如下:

OrderNo, ItemType, ItemCount  
111, Health, 1  
111, Leisure, 2  
111, Sports, 1  
222, Health, 3      
333, Food, 7  
333, Clothing, 1  
444, Clothing, 2  
444, Health, 1  
444, Accessories, 2  

. . .

此列表继续进行,并包含约3,000个观察值.

the list goes on and has about 3,000 observations.

我将数据折叠到一个矩阵中,该矩阵为每个包含特定ItemType计数的唯一订单包含一行:

I collapsed the data into a matrix that contains one row for each unique order containing counts of specific ItemType:

 OrderNo, Accessories, Clothing, Food, Health, Leisure, Sports  
 111, 0, 0, 0, 1, 2, 1  
 222, 0, 0, 0, 3, 0, 0  
 333, 0, 1, 7, 0 , 0, 0  
 444, 2, 2, 0, 1, 0, 0  
 . . .

每次我尝试使用以下命令读入交易记录(及其尝试的一百万种变体):

Every time I try to read in the transactions using the following command (and a million attempted variations of it):

tr <- read.transactions("dataset.csv", rm.duplicates=FALSE, format="basket", sep=",")

我收到错误消息: asMethod(object)中的错误:无法对具有重复项的交易强制执行列表.

I get the error message: Error in asMethod(object): can not coerce list with transactions with duplicated items.

我假设这是因为我有3,000次观察,并且不可避免地会有某些组合出现多次(即,一个以上的人仅购买一件服装,而没有其他东西:OrderNo,0、1, 0,0,0,0).我知道我可以根据唯一组合的数量折叠数据集,但是我担心如果这样做,就不会显示最频繁的组合的权重.

I'm assuming this is because I have 3,000 observations and inevitably certain combinations are going to show up more than once (i.e., more than one person is purchasing only one piece of Clothing and nothing else: OrderNo, 0, 1, 0, 0, 0, 0). I know I could collapse the data set on counts of unique combinations, but I'm worried that if I do that, there will be no weights to show the most frequent combinations.

我认为使用format ="basket"可以说明包含相同项目组合的不同订单,但显然并非如此.我迷路了.我阅读的所有文档都暗示这是可能的,但我找不到任何有关如何解决该问题的示例或建议.

I thought that using format="basket" would account for different orders containing the same item combinations, but apparently that's not the case. I'm so lost. All the documentation I've read implies that this is possible but I can't find any examples or advice on how to approach the problem.

任何建议将不胜感激!我的头在旋转.

Any advice would be so appreciated! My head is spinning on this one.

更多信息:为了最终结果,我希望获得购买组合中最重要的前五种组合.我不知道这是否有帮助.

Extra info: For my end result, I'm looking to get the top five most significant combinations of purchase combinations. I don't know if that helps.

推荐答案

您必须删除重复项,如果您使用的是.CSV文件,请在处理此文件之前在Excel中运行数据->删除重复项.如果找到重复项,则arules会引发错误,这是因为您遇到了错误.

You must remove duplicates, if you are using .CSV file, please run Data -> Remove Duplicate in Excel before processing this file. arules throws error if duplicate are found and it is because of that you are getting the error.

另一种方法是在项目集上使用plicated(),然后使用unique()删除重复项.

Another way is to use duplicated() on your itemset and remove the duplicate using unique().

或者在此SO帖子中找到更简单的方法

Or a more simple approach would be found in this SO post

使用arules包中的重复交易进行关联分析R

这篇关于使用具有唯一订单号但重复订单组合的arules包进行R购物篮分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆