如何将 FPGrowth 项集限制为 2 或 3 [英] How to limit FPGrowth itemesets to just 2 or 3

查看:73
本文介绍了如何将 FPGrowth 项集限制为 2 或 3的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 jupyter notebook 在 python3.6 中使用 pyspark 运行 FPGrowth 算法.当我试图保存生成的规则的关联规则输出是巨大的.所以我想限制结果的数量.这是我尝试过的代码.我还更改了 spark 上下文参数.

I am running the FPGrowth algorithm using pyspark in python3.6 using jupyter notebook. When I am trying to save the association rules output of rules generated is huge. So I want to limit the number of consequent. Here is the code which I have tried. I also changed the spark context parameters.

最大模式长度 fpGrowth (Apache) PySpark

from pyspark.sql.functions import col, size
from pyspark.ml.fpm import FPGrowth
from pyspark.sql import Row
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
from pyspark import SparkConf

conf = SparkConf().setAppName("App")
conf = (conf.setMaster('local[*]')
        .set('spark.executor.memory', '100G')
        .set('spark.driver.memory', '400G')
        .set('spark.driver.maxResultSize', '200G'))
sc = SparkContext.getOrCreate(conf=conf)
spark = SparkSession(sc)
R = Row('ID', 'items')
df=spark.createDataFrame([R(i, x) for i, x in enumerate(lol)])
fpGrowth = FPGrowth(itemsCol="items", minSupport=0.7, minConfidence=0.9)

model = fpGrowth.fit(df)
ar=model.associationRules.where(size(col('antecedent')) == 2).where(size(col('cosequent')) == 1)

ar.cache()
ar.toPandas().to_csv('output.csv')

     It gives an error


   TypeError Traceback (most recent call last)
   <ipython-input-1-f90c7a9f11ae> in <module>

   ---> 73 ar=model.associationRules.where(size(col('antecedent')) == 
  2).where(size(col('consequent')) == 1)
   TypeError: 'str' object is not callable

谁能帮我解决这个问题.

Can someone help me to solve the issue.

这里是交易列表:[['a','b'],['c','a','e']....]

Here lol is list of list of transactions: [['a','b'],['c','a','e']....]

Python:3.6.5派斯帕克视窗 10

Python: 3.6.5 Pyspark Windows 10

推荐答案

从上面的讨论和下面这个链接,它帮助我解决了这个问题.

From the above discussion and following this link, it helped me to resolve the problem.

'str' 对象不可调用类型错误

   import pyspark.sql.functions as func
   model.associationRules.where(func.size(func.col('antecedent')) == 1).where(func.size(func.col('consequent')) == 1).show()

这篇关于如何将 FPGrowth 项集限制为 2 或 3的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆