查找2对对那笔为相同的值 [英] Find two pairs of pairs that sum to the same value

查看:180
本文介绍了查找2对对那笔为相同的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有随机的二维数组,我让使用

I have random 2d arrays which I make using

import numpy as np
from itertools import combinations
n = 50
A = np.random.randint(2, size=(n,n))

我想以确定该矩阵具有两对双行其总和为相同的行向量的。我要寻找一个快速的方法来做到这一点。我目前的方法只是尝试所有的可能性。

I would like to determine if the matrix has two pairs of pairs of rows which sum to the same row vector. I am looking for a fast method to do this. My current method just tries all possibilities.

for pair in  combinations(combinations(range(n), 2), 2):
    if (np.array_equal(A[pair[0][0]] + A[pair[0][1]], A[pair[1][0]] + A[pair[1][1]] )):
        print "Pair found", pair

这工作了一种方法, N = 100 将是真正伟大的。

A method that worked for n = 100 would be really great.

推荐答案

下面是一个懒惰的方式,即扩展到N = 10000,使用的内存只有'4GB,并在完成每次呼叫左右10秒。最坏情况复杂度为O(n ^ 3),但对于随机数据,预期表现为O(n ^ 2)。乍一看,好像你几乎需要为O(n ^ 3)OPS;每一行组合需要被制造和检查至少一次。但是,我们需要的不是看整个行。相反,我们可以rowpairs的比较进行早期的退出策略,一旦明确他们是没有用的我们;和随机数据,我们可以得出这样的结论通常是很久以前我们考虑所有列在一排。

Here is a 'lazy' approach, that scales up to n=10000, using 'only' 4gb of memory, and completing in 10s per call or so. Worst case complexity is O(n^3), but for random data, expected performance is O(n^2). At first sight, it seems like youd need O(n^3) ops; each row combination needs to be produced and inspected at least once. But we need not look at the entire row. Rather, we can perform an early exit strategy on the comparison of rowpairs, once it is clear they are of no use to us; and for random data, we may draw this conclusion typically long before we have considered all columns in a row.

import numpy as np

n = 10
#also works for non-square A
A = np.random.randint(2, size=(n*2,n)).astype(np.int8)
#force the inclusion of some hits, to keep our algorithm on its toes
##A[0] = A[1]


def base_pack_lazy(a, base, dtype=np.uint64):
    """
    pack the last axis of an array as minimal base representation
    lazily yields packed columns of the original matrix
    """
    a = np.ascontiguousarray( np.rollaxis(a, -1))
    init = np.zeros(a.shape[1:], dtype)
    packing = int(np.dtype(dtype).itemsize * 8 / (float(base) / 2))
    for columns in np.array_split(a, (len(a)-1)//packing+1):
        yield reduce(
            lambda acc,inc: acc*base+inc,
            columns,
            init)

def unique_count(a):
    """returns counts of unique elements"""
    unique, inverse = np.unique(a, return_inverse=True)
    count = np.zeros(len(unique), np.int)
    np.add.at(count, inverse, 1)        #note; this scatter operation requires numpy 1.8; use a sparse matrix otherwise!
    return unique, count, inverse

def has_identical_row_sums_lazy(A, combinations_index):
    """
    compute the existence of combinations of rows summing to the same vector,
    given an nxm matrix A and an index matrix specifying all combinations

    naively, we need to compute the sum of each row combination at least once, giving n^3 computations
    however, this isnt strictly required; we can lazily consider the columns, giving an early exit opportunity
    all nicely vectorized of course
    """

    multiplicity, combinations = combinations_index.shape
    #list of indices into combinations_index, denoting possibly interacting combinations
    active_combinations = np.arange(combinations, dtype=np.uint32)

    for packed_column in base_pack_lazy(A, base=multiplicity+1):       #loop over packed cols
        #compute rowsums only for a fixed number of columns at a time.
        #this is O(n^2) rather than O(n^3), and after considering the first column,
        #we can typically already exclude almost all rowpairs
        partial_rowsums = sum(packed_column[I[active_combinations]] for I in combinations_index)
        #find duplicates in this column
        unique, count, inverse = unique_count(partial_rowsums)
        #prune those pairs which we can exclude as having different sums, based on columns inspected thus far
        active_combinations = active_combinations[count[inverse] > 1]
        #early exit; no pairs
        if len(active_combinations)==0:
            return False
    return True

def has_identical_triple_row_sums(A):
    n = len(A)
    idx = np.array( [(i,j,k)
        for i in xrange(n)
            for j in xrange(n)
                for k in xrange(n)
                    if i<j and j<k], dtype=np.uint16)
    idx = np.ascontiguousarray( idx.T)
    return has_identical_row_sums_lazy(A, idx)

def has_identical_double_row_sums(A):
    n = len(A)
    idx = np.array(np.tril_indices(n,-1), dtype=np.int32)
    return has_identical_row_sums_lazy(A, idx)


from time import clock
t = clock()
for i in xrange(10):
    print has_identical_double_row_sums(A)
    print has_identical_triple_row_sums(A)
print clock()-t

扩展到包括计算过行三胞胎的款项,因为你在上面问。只有对于n = 100,这还需要大约0.2秒

Extended to include the calculation over sums of triplets of rows, as you asked above. For n=100, this still takes only about 0.2s

编辑:一些清理工作; EDIT2:一些清理

some cleanup; edit2: some more cleanup

这篇关于查找2对对那笔为相同的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆