“零频物品”使用eclat挖掘频繁项集时 [英] "Zero frequent items" when using the eclat to mine frequent itemsets

查看：138 发布时间：2020/10/17 22:03:57 r data-mining

本文介绍了“零频物品”使用eclat挖掘频繁项集时的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

因此，我想根据一起购买的物品并根据eclat的Wiki查找样式和簇：

So I want to find patterns and "clusters" based on what items that are bought together, and according to the wiki for eclat:

Eclat算法用于执行项目集挖掘。项目集挖掘使我们可以发现数据中的频繁模式，例如，消费者购买牛奶，也购买面包。这种类型的模式称为关联规则，并在许多应用程序域中使用。

The Eclat algorithm is used to perform itemset mining. Itemset mining let us find frequent patterns in data like if a consumer buys milk, he also buys bread. This type of pattern is called association rules and is used in many application domains.

但是，当我在R中使用eclat时，通过tidLists检索结果时，将获得零频繁项和 NULL。有人可以看到我在做什么错吗？

Though, when I use the eclat in R, i get "zero frequent items" and "NULL" when when retrieving the results through tidLists. Anyone can see what I am doing wrong?

完整的数据集： https://pastebin.com/8GbjnHK2

每行都是一个事务，在各列中包含不同的项目。数据快速快照：

Each row is a transactions, containing different items in the columns. Quick snap of the data:

3060615;;;;;;;;;;;;;;;
3060612;3060616;;;;;;;;;;;;;;
3020703;;;;;;;;;;;;;;;
3002469;;;;;;;;;;;;;;;
3062800;;;;;;;;;;;;;;;
3061943;3061965;;;;;;;;;;;;;;

代码

trans = read.transactions("Transactions.csv", format = "basket", sep = ";")

f <- eclat(trans, parameter = list(supp = 0.1, maxlen = 17, tidLists = TRUE))

dim(tidLists(f))

as(tidLists(f), "list")

是否由于数据结构？在这种情况下，我该如何更改？此外，我该怎么做才能获得建议的项目集？我无法从Wiki上了解这一点。

Could it be due to the data structure? In that case, how should I change it? Furthermore, what do I do to get the suggested itemsets? I couldn't figure that out from the wiki.

编辑：我使用0.004作为补充，如@ hpesoj626所建议。但是似乎该功能正在对订单/用户而不是物品进行分组。我不知道如何导出数据，所以这是tidLists的图片：

I used 0.004 for supp, as suggested by @hpesoj626. But it seems like the function is grouping the orders/users and not the items. I don't know how to export the data, so here is a picture of the tidLists:

推荐答案

问题是您设置的支持过高。尝试调整 supp ，例如 supp = .001 ，我们得到

The problem is that you have set your support too high. Try adjusting supp say, supp = .001, for which we get

dim(tidLists(f))

# [1]   928 15840

对于您的数据集，最高支持是0.08239，低于0.1。这就是为什么 supp = 0.1 不会获得结果的原因。

For your data set, the highest support is 0.08239 which is below 0.1. That is why you are getting no results with supp = 0.1.

inspect(head(sort(f, by = "support"), 10))

#      items             support count
# [1]  {3060620}         0.08239 1305 
# [2]  {3060619}         0.07260 1150 
# [3]  {3061124}         0.05688  901 
# [4]  {3060618}         0.05663  897 
# [5]  {4027039}         0.04975  788 
# [6]  {3060617}         0.04564  723 
# [7]  {3061697}         0.04306  682 
# [8]  {3060619,3060620} 0.03087  489 
# [9]  {3039715}         0.02727  432 
# [10] {3045117}         0.02708  429

这篇关于“零频物品”使用eclat挖掘频繁项集时的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

“零频物品”使用eclat挖掘频繁项集时 [英] "Zero frequent items" when using the eclat to mine frequent itemsets

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

“零频物品”使用eclat挖掘频繁项集时 [英] &quot;Zero frequent items&quot; when using the eclat to mine frequent itemsets

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

“零频物品”使用eclat挖掘频繁项集时 [英] "Zero frequent items" when using the eclat to mine frequent itemsets

登录关闭