最大模式长度fpGrowth(Apache)PySpark [英] Maximum Pattern Length fpGrowth (Apache) PySpark

查看:77
本文介绍了最大模式长度fpGrowth(Apache)PySpark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用PySpark运行关联规则.我首先创建一个FPGrowth树,并将其传递给关联规则方法.

I am trying to run Association rules using PySpark. I first create an FPGrowth tree and pass that to the Association Rules method.

但是,我希望添加最大图案长度参数,以限制我要在LHS和RHS上使用的商品数量.对于项目之间的关联,我只想将模式长度保持为2.

However, I wish to add a maximum pattern length parameter, to limit the number of items I want on the LHS and RHS. I only want to keep pattern length to 2 for associations between items.

## fit model

from pyspark.ml.fpm import FPGrowth

fpGrowth_1 = FPGrowth(itemsCol="collect_set(title_name)", minSupport=.001, minConfidence=0.001)

model_working_1 = fpGrowth_1.fit(transactions_2)

## Display frequent itemsets.

model_working_1.freqItemsets.show()

+--------------------+------+
|               items|  freq|
+--------------------+------+
|[Temptation Islan...|325291|
|[Temptation Island] |282205|
|[Temptation Islan...|175694|
|[S4 - Engl  progr...|171400|
|      [Nieuwe Buren]|168684|
|[Neighboursss, Te...|113113|
|       [Love Island]|146766|
|[Love Island, S4 ...| 65285|
|[Love Island, Tem...|105834|
|[Love Island, Tem...| 83335|
|[Love Island, Tem...|115979|
|[Good Time Sle......|132439|
+--------------------+------+

# Display generated association rules.
model_working_1.associationRules.show()
+--------------------+--------------------+------------------+
|          antecedent|          consequent|        confidence|
+--------------------+--------------------+------------------+
|[Love Island, Tem...| [Temptation Island]|0.7185352520714957|
|[De Beste Verleid...|[Temptation Islan...|0.9147820487266372|
|     [Bella Donna's]|[Temptation Islan...|  0.74988107580655|
|[Binnenkort bij V...|[Temptation Islan...|0.9756179956817415|
|[Married at First...| [Temptation Island]|0.8692627446452283|
|       [Love Island]| [Temptation Island]|0.7211070683945873|
|       [Love Island]|[Temptation Islan...|0.7902307073845442|
|[S4 - Dutch progr...| [Temptation Island]|  0.61975495915986|
|[S4 - Dutch progr...|[Temptation Islan...|0.7550758459743291|
|[The Good Doctor,...| [Temptation Island]| 0.873575189492565|
+--------------------+--------------------+------------------+


# transform examines the input items against all the association rules and summarize the

# consequents as prediction

model_working_1.transform(transactions_2).show()

+---------------------+----------------------------------------------------------------------------------------------+
|         title_name  |        Prediction                                                        |        
+---------------------+----------------------------------------------------------------------------------------------+
|[Goode Time Bad  ....| Temptation Island VIPS,S4 - Dutch program viewer,Weg van Jou                                                                                          |  
                                                     The Good Doctor,Moordvrouw,De 12 van Oldenheim,Married at First Sight,Dave en Dien op Ibiza,Temptation Gossip]                           |  
|[S4 - Englis progr...|Lara Croft Tomb Raider, Ronald Goedemondt - Geen Sp
|[Goede Tijden Sl.........|[I Love You Tattoo, S7 - Dutch suspense-series viewer, Temptation Island VIPS, Awkward, Goede Tijden Slechte Tijden, Lost, De Beste Verleiders, Cellblock H]|

由此产生的关联规则实际上是很长的模式.我想将长度保持在2种模式,或者更多一些.现在,我要解释或理解的东西太多了.

The resulting association rules are really long patterns. I want to keep the length to 2 patterns of maybe bit more. Right now I am going too many to interpret or comprehend.

有没有办法限制PySPark中的图案长度?我找到了Scala的链接 scala中的模式长度,但在PySaprk中却没有这样的链接

Is there a way where I can constrain pattern length in PySPark? I found a link for scala pattern length in scala but nothing like this in PySaprk.

在这种情况下,如果您可以建议/帮助我,我将不胜感激.在此先感谢!!!

I would appreciate if you can suggest/help me out in this situation. Thanks in advance !!!

推荐答案

pyspark 中,您可以尝试:

from pyspark.sql.functions import col, size
model.associationRules.where(size(col('antecedent')) == 1).where(size(col('cosequent')) == 1).show()

这篇关于最大模式长度fpGrowth(Apache)PySpark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆