从2项目集创建k-集 [英] creating k -itemsets from 2-itemsets

查看:161
本文介绍了从2项目集创建k-集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了下面的代码来生成2元素集的k元素项目集。这两个元素集作为clist1和clist2传递给candidateItemsetGen。

  public static void candidateItemsetGen(ArrayList< Integer> clist1,ArrayList< Integer> clist2)
{
for(int i = 0; i {
for(int j = i + 1; j {
for(int k = 0; k {
int r = clist1.get(k).compareTo(clist2.get (K));
if(r == 0&& clist1.get(k)-1 == clist2.get(k)-1)
{
** candidateItemset.add(clist1。 get(i),clist1.get(clist1.size() - 1),clist2.get(clist2.size() - 1));
}
}
}
}
// return candidateItemset;



$ b

创建k-itemsets的条件是clist1(i)== clist2 (i),其中i = 1,...,k-2和clist1(k-2)!= clist2(k-2)。但是,在我把**放在代码中有错误。我怎样才能解决这个问题?逻辑是这个函数生成candidateItemsets,它将被再次用作一个输入来生成其他候选Itemsets。你可以优化如果你认为每个itemset列表都是根据词法顺序排序的话,那么这个代码会更进一步。

例如,假设

clist1 = AB,AD,AF,AG,BC,FG

clist2 = BD,FE,FG,FH,FI

使用您的代码,您将比较AB与clist2的所有项目集。

但是你可以通过在BD之后立即停止,因为根据词法顺序B大于AB中的A。因此,Clist2中的BD之后的项目集都不会与AB匹配。



如果您想查看Apriori优化实现的代码,可以检查我的名为SPMF的开源数据挖掘库


I have written the following code to generate k-elements itemsets from 2-element sets. The two elements sets are passed to candidateItemsetGen as clist1 and clist2.

    public static void candidateItemsetGen(ArrayList<Integer> clist1, ArrayList<Integer> clist2) 
        {
            for(int i = 0; i < clist1.size(); i++)
            {
                for(int j = i+1; j < clist2.size(); j++)
                {
                   for(int k = 0; k < clist1.size()-2; k++)
                   {
                       int r = clist1.get(k).compareTo(clist2.get(k));
                       if(r == 0 && clist1.get(k)-1 == clist2.get(k)-1)
                       {
 **                           candidateItemset.add(clist1.get(i), clist1.get(clist1.size()-1), clist2.get(clist2.size()-1));
                       }
                   }
                }
            }
//    return candidateItemset;
        }

The condition to create k-itemsets is that clist1(i) == clist2(i), where i = 1,...,k-2 and clist1(k-2) != clist2(k-2). But there is error in the code where i have put **. How can i fix this? The logic is that this function generates candidateItemsets which will be used again as an input to generate other candidate Itemsets.

解决方案

You could optimize that code further if you consider that each list of itemsets are sorted according to the lexical order.

For example, let's say that

clist1 = AB, AD, AF, AG, BC, FG

clist2 = BD, FE, FG, FH, FI

With your code, you will compare AB with all the itemsets of clist2.

But you could optimize that, by stoping right after BD because B is larger than A in AB according to the lexical order. Therefore, no itemsets after BD in Clist2 will match with AB.

If you want to see the code of an optimized implementation of Apriori, you can check my open source data mining library named SPMF

这篇关于从2项目集创建k-集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆