频繁模式挖掘的关联规则 [英] Association rules with Frequent Pattern Mining

查看：24 发布时间：2021/11/14 21:07:07 scala apache-spark apache-spark-mllib

本文介绍了频繁模式挖掘的关联规则的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用以下代码 Spark-Scala 提取一组事务的关联规则:

I want to extract association rules for a set of transaction with following code Spark-Scala:

val fpg = new FPGrowth().setMinSupport(minSupport).setNumPartitions(10)
val model = fpg.run(transactions)
model.generateAssociationRules(minConfidence).collect()

然而，产品的数量超过 10K，因此提取所有组合的规则在计算上具有表现力，而且我不需要它们全部.所以我只想成对提取:

however the number of products are more than 10K so extracting the rules for all combination is computationally expressive and also I do not need them all. So I want to extract only pair wise:

Product 1 ==> Product 2
Product 1 ==> Product 3
Product 3 ==> Product 1

而且我不关心其他组合，例如:

and I do not care about other combination such as:

[Product 1] ==> [Product 2, Product 3]
[Product 3,Product 1] ==> Product 2

有没有办法做到这一点?

Is there any way to do that?

谢谢，阿米尔

推荐答案

假设您的交易或多或少是这样的:

Assuming your transactions look more or less like this:

val transactions = sc.parallelize(Seq(
  Array("a", "b", "e"),
  Array("c", "b", "e", "f"),
  Array("a", "b", "c"),
  Array("c", "e", "f"),
  Array("d", "e", "f")
))

您可以尝试手动生成频繁项集并直接应用AssociationRules:

you can try to generate frequent itemsets manually and apply AssociationRules directly:

import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset

val freqItemsets = transactions
  .flatMap(xs => 
    (xs.combinations(1) ++ xs.combinations(2)).map(x => (x.toList, 1L))
  )
  .reduceByKey(_ + _)
  .map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)}

val ar = new AssociationRules()
  .setMinConfidence(0.8)

val results = ar.run(freqItemsets)

注意事项:

不幸的是，您必须手动处理支持过滤.可以通过在 freqItemsets
你应该在 flatMap
如果 freqItemsets 太大而无法处理，您可以将 freqItemsets 分成几个步骤来模拟实际的 FP 增长:

unfortunately you'll have to handle filtering by support manually. It can be done by applying filter on freqItemsets
you should consider increasing number of partitions before flatMap
if freqItemsets is to large to be handled you can split freqItemsets into few steps to mimic actual FP-growth:

生成 1-patterns 并按支持过滤
仅使用步骤 1 中的频繁模式生成 2-模式

这篇关于频繁模式挖掘的关联规则的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

频繁模式挖掘的关联规则 [英] Association rules with Frequent Pattern Mining

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

频繁模式挖掘的关联规则 [英] Association rules with Frequent Pattern Mining

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭