使用R中的arules包对重复交易进行关联分析 [英] Association analysis with duplicate transactions using arules package in R
问题描述
我想创建篮子格式的交易对象,可以随时调用进行分析.数据包含带有1001个事务的逗号分隔的项目.前10个交易如下所示:
I want to create a transaction object in basket format which I can call anytime for my analyses. The data contains comma separated items with 1001 transactions. The first 10 transactions look like this:
hering,corned_b,olives,ham,turkey,bourbon,ice_crea
baguette,soda,hering,cracker,heineken,olives,corned_b
avocado,cracker,artichok,heineken,ham,turkey,sardines
olives,bourbon,coke,turkey,ice_crea,ham,peppers
hering,corned_b,apples,olives,steak,avocado,turkey
sardines,heineken,chicken,coke,ice_crea,peppers,ham
olives,bourbon,coke,turkey,ice_crea,heineken,apples
corned_b,peppers,bourbon,cracker,chicken,ice_crea,baguette
soda,olives,bourbon,cracker,heineken,peppers,baguette
corned_b,peppers,bourbon,cracker,chicken,bordeaux,hering
...
我发现数据中存在重复的事务,并删除了它们,但是每次尝试读取事务时,都会得到:
I observed that there are duplicated transactions in the data and removed them but each time I tried to read the transactions, I get:
asMethod(object)中的错误: 无法强制处理重复项目的交易列表
Error in asMethod(object) : can not coerce list with transactions with duplicated items
这是我的代码:
data <- read.csv("AssociationsItemList.txt",header=F)
data <- data[!duplicated(data),]
pop <- NULL
for(i in 1:length(data)){
pop <- paste(pop, data[i],sep="\n")
}
write(pop, file = "Trans", sep = ",")
transdata <- read.transactions("Trans", format = "basket", sep=",")
我确定我错过了一些重要的事情.请提供您的帮助.
I'm sure there's something little yet important I've missed. Kindly offer your assistance.
推荐答案
问题不在于重复的事务(同一行出现两次) 但重复的商品(同一商品在同一笔交易中出现两次) 例如第4行上的橄榄".
The problem is not with duplicated transactions (the same row appearing twice) but duplicated items (the same item appearing twice, in the same transaction -- e.g., "olives" on line 4).
read.transactions
具有一个rm.duplicates
参数来删除这些重复项.
read.transactions
has an rm.duplicates
argument to remove those duplicates.
read.transactions("Trans", format = "basket", sep=",", rm.duplicates=TRUE)
这篇关于使用R中的arules包对重复交易进行关联分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!