元组是另一个元组的子集-Apriori algortihm [英] Tuple is subset of another tuple - Apriori algortihm
问题描述
我正在尝试实现apriori算法.在最后的步骤之一中,我从产品列表中生成了两个元组数组.
I'm trying to implement the apriori algorithm. In one of the final steps I have two arrays of tuples generated from a list of products.
>>> arr1 = array([(2421,), (35682,), (30690,), ..., (18622,), (18285,), (31269,)],
dtype=object)
>>> arr2 = array([(2421, 35682), (2421, 30690), (2421, 24852), ..., (18622, 18285),
(18622, 31269), (18285, 31269)], dtype=object))
我认为我需要检查arr1
的哪些是arr2
的子元组,即(2421, )
是(2421, 30690)
的子元组.
The think is that I need to check which of arr1
are subtuples of arr2
, i.e. (2421, )
is a subtuple of (2421, 30690)
.
我尝试了
>>> if (2421,) in (2421, 1231):
... print('Yes')
... else:
... print('No')
,我得到No
.我也尝试使用.issubset
,但得到AtributeError
.
and I get No
. I also tried using .issubset
but I get AtributeError
.
我想知道如何在不遵循严格方法的情况下做到这一点,
I would like to know how can I do this without going the hardcore way,
>>> print(len(arr1), len(arr2))
(9258, 263616)
我正在使用带有Python 2的Jupyter笔记本.仅使用numpy,pandas和itertools.
I'm using a Jupyter notebook with Python 2. Only using numpy, pandas and itertools.
所需的输出应为以下形式:如果我有产品1,2,3
,但我只考虑元组(1,)
和(2,)
,那么我需要从产品的所有2种组合中选择(1,2)
而不是(1,3)
.
The desired output should be of the form; if I have products 1,2,3
but I only consider the tuples (1,)
and (2,)
then I need (1,2)
but not (1,3)
from all the 2-combinations of products.
推荐答案
如果要实现Apriori算法,则要使用实际集而不是元组. Python有两种集合类型, set
和
If you are implementing the Apriori algorithm, you want to uses actual sets instead of tuples. Python has two set types, set
and frozenset
, where the latter is immutable and thus can be stored in dictionaries or other sets. You probably want to use the latter so you can associate such sets with support scores.
这肯定是 apyori
项目实现的方法用途; apyory
是Apriori算法的纯Python库.
That's certainly the approach that the apyori
project implementation uses; apyory
is a pure-Python library of the Apriori algorithm.
您可以对元组进行子集测试,但这对于大小为N和M的元组来说,这是一个缓慢的O(NM)操作:
You can do subset tests with tuples, but this is a slow O(NM) operation for tuples of sizes N and M:
def tuple_is_subset(ta, tb):
return all(tav in tb for tav in ta)
这是对ta
中N个项目的完整循环,每个tav in tb
测试花费M = len(tb)
步骤.
That's a full loop over N items in ta
, and each tav in tb
test takes M = len(tb)
steps.
您可以将元组转换为集合,但是这也需要O(N)+ O(M)时间,之后子集测试将花费O(N)时间.这使整个过程花费了线性时间,但是对于小的元组,我怀疑创建新对象的不变成本将超过上述理论上更为昂贵的O(NM)all()
测试.
You can convert the tuples to sets but that too takes O(N) + O(M) time, after which the subset test takes O(N) time. That makes the whole take linear time, but for small tuples, I suspect that the constant costs of creating new objects will outweigh the theoretical more costly O(NM) all()
test above.
要使用集,您可以使用:
To use sets, you can use:
set(ta).issubset(tb)
set.issubset()
接受任何不可设置的可迭代对象,代码将为测试创建一个临时的设置对象.
where set.issubset()
accepts any non-set iterable, the code creates a temporary set object for the test.
这篇关于元组是另一个元组的子集-Apriori algortihm的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!