象夫PFPGrowth算法的输出错误? [英] Wrong output of mahout PFPGrowth algorithm?
问题描述
我使用象夫对Hadoop集群的顶部PFP增长实现最新的商业版本,以确定movielens数据频繁模式。
在previous一步我转换的数据集交易的列表作为亲民党生长算法需要的输入格式。
I'm using latest trunk version of mahout's PFP Growth implementation on top of a hadoop cluster to determine frequent patterns in movielens dataset. In a previous step I converted the dataset to a list of transactions as the pfp growth algorithm needs that input format.
但是,输出我得到的是意想不到的。
However, the output I get is unexpected
例如对于项目1017只频繁模式是
For example for item 1017 the only frequent pattern is
1017([100,1017,50])
1017 ([100,1017, 50])
我也期望像X> = 50在该行的模式([1017],X)。
I would also expect a pattern like ([1017], X) with X >= 50 in that line.
我也testset一个例子输入
I also testset an example input
1,2,3
1,2,3
1,3
和输出我得到的是
1([1,3],3),([1],3),([1,3,2],2)
1 ([1, 3],3), ([1],3), ([1, 3, 2],2)
2([1,3,2],2)
2 ([1, 3, 2],2)
3([1,3],3),([1,3,2],2)
3 ([1, 3],3), ([1, 3, 2],2)
有缺失模式,如([1,2],2)
There are missing patterns like ([1,2],2)
什么是错的?
推荐答案
的原因是,如果它的支持并不是越大FP算法不频繁模式的输出子集。它的描述如下:
http://www.searchworkings.org/forum/-/message_boards/view_message/396093
The reason is that the FP Algorithm does not output subsets of a frequent pattern if its support is not greater. It's described here: http://www.searchworkings.org/forum/-/message_boards/view_message/396093
我需要重写code为我所用。
I need to rewrite the code for my use.
这篇关于象夫PFPGrowth算法的输出错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!