“零频物品”使用eclat挖掘频繁项集时 [英] "Zero frequent items" when using the eclat to mine frequent itemsets

查看:138
本文介绍了“零频物品”使用eclat挖掘频繁项集时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我想根据一起购买的物品并根据eclat的Wiki查找样式和簇:

So I want to find patterns and "clusters" based on what items that are bought together, and according to the wiki for eclat:


Eclat算法用于执行项目集挖掘。项目集挖掘使我们可以发现数据中的频繁模式,例如,消费者购买牛奶,也购买面包。这种类型的模式称为关联规则,并在许多应用程序域中使用。

The Eclat algorithm is used to perform itemset mining. Itemset mining let us find frequent patterns in data like if a consumer buys milk, he also buys bread. This type of pattern is called association rules and is used in many application domains.

但是,当我在R中使用eclat时,通过tidLists检索结果时,将获得零频繁项和 NULL。有人可以看到我在做什么错吗?

Though, when I use the eclat in R, i get "zero frequent items" and "NULL" when when retrieving the results through tidLists. Anyone can see what I am doing wrong?

完整的数据集: https://pastebin.com/8GbjnHK2

每行都是一个事务,在各列中包含不同的项目。数据快速快照:

Each row is a transactions, containing different items in the columns. Quick snap of the data:

3060615;;;;;;;;;;;;;;;
3060612;3060616;;;;;;;;;;;;;;
3020703;;;;;;;;;;;;;;;
3002469;;;;;;;;;;;;;;;
3062800;;;;;;;;;;;;;;;
3061943;3061965;;;;;;;;;;;;;;

代码

trans = read.transactions("Transactions.csv", format = "basket", sep = ";")

f <- eclat(trans, parameter = list(supp = 0.1, maxlen = 17, tidLists = TRUE))

dim(tidLists(f))

as(tidLists(f), "list")

是否由于数据结构?在这种情况下,我该如何更改?此外,我该怎么做才能获得建议的项目集?我无法从Wiki上了解这一点。

Could it be due to the data structure? In that case, how should I change it? Furthermore, what do I do to get the suggested itemsets? I couldn't figure that out from the wiki.

编辑:我使用0.004作为补充,如@ hpesoj626所建议。但是似乎该功能正在对订单/用户而不是物品进行分组。我不知道如何导出数据,所以这是tidLists的图片:

I used 0.004 for supp, as suggested by @hpesoj626. But it seems like the function is grouping the orders/users and not the items. I don't know how to export the data, so here is a picture of the tidLists:

推荐答案

问题是您设置的支持过高。尝试调整 supp ,例如 supp = .001 ,我们得到

The problem is that you have set your support too high. Try adjusting supp say, supp = .001, for which we get

dim(tidLists(f))

# [1]   928 15840

对于您的数据集,最高支持是0.08239,低于0.1。这就是为什么 supp = 0.1 不会获得结果的原因。

For your data set, the highest support is 0.08239 which is below 0.1. That is why you are getting no results with supp = 0.1.

inspect(head(sort(f, by = "support"), 10))

#      items             support count
# [1]  {3060620}         0.08239 1305 
# [2]  {3060619}         0.07260 1150 
# [3]  {3061124}         0.05688  901 
# [4]  {3060618}         0.05663  897 
# [5]  {4027039}         0.04975  788 
# [6]  {3060617}         0.04564  723 
# [7]  {3061697}         0.04306  682 
# [8]  {3060619,3060620} 0.03087  489 
# [9]  {3039715}         0.02727  432 
# [10] {3045117}         0.02708  429 

这篇关于“零频物品”使用eclat挖掘频繁项集时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆