元组是另一个元组的子集-Apriori algortihm [英] Tuple is subset of another tuple - Apriori algortihm

查看:127
本文介绍了元组是另一个元组的子集-Apriori algortihm的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现apriori算法.在最后的步骤之一中,我从产品列表中生成了两个元组数组.

I'm trying to implement the apriori algorithm. In one of the final steps I have two arrays of tuples generated from a list of products.

>>> arr1 = array([(2421,), (35682,), (30690,), ..., (18622,), (18285,), (31269,)],
  dtype=object)

>>> arr2 = array([(2421, 35682), (2421, 30690), (2421, 24852), ..., (18622, 18285),
   (18622, 31269), (18285, 31269)], dtype=object))

我认为我需要检查arr1的哪些是arr2的子元组,即(2421, )(2421, 30690)的子元组.

The think is that I need to check which of arr1 are subtuples of arr2, i.e. (2421, ) is a subtuple of (2421, 30690).

我尝试了

>>> if (2421,) in (2421, 1231):
...    print('Yes')
... else:
...    print('No')

,我得到No.我也尝试使用.issubset,但得到AtributeError.

and I get No. I also tried using .issubset but I get AtributeError.

我想知道如何在不遵循严格方法的情况下做到这一点,

I would like to know how can I do this without going the hardcore way,

>>> print(len(arr1), len(arr2))
(9258, 263616)

我正在使用带有Python 2的Jupyter笔记本.仅使用numpy,pandas和itertools.

I'm using a Jupyter notebook with Python 2. Only using numpy, pandas and itertools.

所需的输出应为以下形式:如果我有产品1,2,3,但我只考虑元组(1,)(2,),那么我需要从产品的所有2种组合中选择(1,2)而不是(1,3).

The desired output should be of the form; if I have products 1,2,3 but I only consider the tuples (1,) and (2,) then I need (1,2) but not (1,3) from all the 2-combinations of products.

推荐答案

如果要实现Apriori算法,则要使用实际集而不是元组. Python有两种集合类型, set ,后者是不可变的,因此可以存储在词典或其他集中.您可能希望使用后者,以便将此类集与支持评分相关联.

If you are implementing the Apriori algorithm, you want to uses actual sets instead of tuples. Python has two set types, set and frozenset, where the latter is immutable and thus can be stored in dictionaries or other sets. You probably want to use the latter so you can associate such sets with support scores.

这肯定是 apyori项目实现的方法用途; apyory是Apriori算法的纯Python库.

That's certainly the approach that the apyori project implementation uses; apyory is a pure-Python library of the Apriori algorithm.

您可以对元组进行子集测试,但这对于大小为N和M的元组来说,这是一个缓慢的O(NM)操作:

You can do subset tests with tuples, but this is a slow O(NM) operation for tuples of sizes N and M:

def tuple_is_subset(ta, tb):
    return all(tav in tb for tav in ta)

这是对ta中N个项目的完整循环,每个tav in tb测试花费M = len(tb)步骤.

That's a full loop over N items in ta, and each tav in tb test takes M = len(tb) steps.

您可以将元组转换为集合,但是这也需要O(N)+ O(M)时间,之后子集测试将花费O(N)时间.这使整个过程花费了线性时间,但是对于小的元组,我怀疑创建新对象的不变成本将超过上述理论上更为昂贵的O(NM)all()测试.

You can convert the tuples to sets but that too takes O(N) + O(M) time, after which the subset test takes O(N) time. That makes the whole take linear time, but for small tuples, I suspect that the constant costs of creating new objects will outweigh the theoretical more costly O(NM) all() test above.

要使用集,您可以使用:

To use sets, you can use:

set(ta).issubset(tb)

set.issubset()接受任何不可设置的可迭代对象,代码将为测试创建一个临时的设置对象.

where set.issubset() accepts any non-set iterable, the code creates a temporary set object for the test.

这篇关于元组是另一个元组的子集-Apriori algortihm的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆