一种快速算法,用于在二进制矩阵中精确地找到k列,以使这些列的总和为1-vector [英] Fast algorithm to find exactly k columns in binary matrix such that the sum of those columns is the 1-vector

查看:53
本文介绍了一种快速算法,用于在二进制矩阵中精确地找到k列,以使这些列的总和为1-vector的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个(M x N)个二进制矩阵,其中M和N都可以很大.我想精确地找到k列(k相对较小,例如小于10),以使这些k列的总和为1-vector(所有元素均为1).一种解决方案是足够的.有一个快速的算法吗?

例如,在矩阵上运行的算法

  1 0 01 0 01 1 00 1 1 

k = 2应该返回列0和2,但是如果k = 1或k = 3则不报告任何解决方案.

我尝试了两种方法:

  1. 一种慢速组合方法,其中我尝试了所有(N个选择k个)组合,然后找到了总和为1个向量的组合.这在O(N ^ k)时间中运行,这显然是可怕的.
  2. 一种递归方法,该方法速度更快,但仍在O(N ^ k)个最坏情况下运行.Python代码如下:

 将numpy导入为npdef recursiveFn(mat,col_used_bool,col_sum_to_date,cols_to_go):N = len(垫)如果cols_to_go == 1:col_unused = 1-col_sum_to_date如果[mat中的i的list(i)中的list(col_unused):返回(正确,[col_unused])别的:返回(False,无)对于范围(N)中的col_id:如果col_used_bool [col_id]:继续如果2不在mat [col_id] + col_sum_to_date中:col_used_bool [col_id] = Truex = recursiveFn(mat,col_used_bool,mat [col_id] + col_sum_to_date,cols_to_go-1)col_used_bool [col_id] = False如果x [0]:返回(真,x [1] + [mat [col_id]])返回(False,无)exMat = [[1,1,1,0],[0,0,1,1],[0,0,0,1]]#由列输入exMat = [用于exMat中的i的np.asarray(i)]k = 2输出= recursiveFn(mat = exMat,col_used_bool = [对于exMat中的i为假],col_sum_to_date = np.asarray([0代表exMat [0中的i为0]],cols_to_go = k)打印(输出[1])###打印此内容:[array([0,0,0,1]),array([1,1,1,0])] 

我对这两种方法都不满意,并且我觉得存在一种更智能,更快速的算法.非常感谢您的帮助.这是我在StackOverflow上的第一篇文章,所以,如果我在某个地方制作了人造馅饼,请和我保持温柔!

((如果有兴趣,我也在Math Stack Exchange上提出了相同的问题,但是我在较少关注算法效率,而更关注数学技术.)

解决方案

我的第一次尝试是整数-使用一种可用的高性能求解器(例如 Cbc )进行编程.

假设您的发病率矩阵中有稀疏性,那么这些效率将非常高且非常笼统(附带约束/适应).它们也很完整,可能会证明不可行.

一个简单的公式如下所示:

 实例c0 c1 c21 0 0 r01 0 0 r11 1 0 r20 1 1 r3IP:最小化(0)#恒定目标|纯可行性问题sum(c_i)= k#选择的列目标r0 = 1 = c0#r0仅显示约束的起源;没有实际变数!r1 = 1 = c0r2 = 1 = c0 + c1r3 = 1 = c1 + c2{0,1}中的c_i#所有变量均为二进制 

可能有可能通过其他不等式(例如集团不等式(冲突图->最大值))来加强这种表述,但不确定是否有帮助.优秀的求解器会动态地生成 cuts 来做类似动态的事情.

有很多理论可用.一个关键字是确切的封面或所有非常相似的打包/覆盖问题.

简单的代码示例:

 将cvxpy导入为cp将numpy导入为np数据= np.array([[1,0,0],[1、0、0],[1,1,0],[0,1,1]])defsolve(k,数据):c = cp.Variable(data.shape [1],boolean = True)con = [data * c == 1cp.sum(c)== k,c> = 0,c< = 1]obj = cp.Minimize(0)问题= cp.Problem(obj,con)problem.solve(详细=正确,求解器= cp.GLPK_MI)if(problem.status =='最优'):返回np.where(np.isclose(c.value,1.0)== True)[0]别的:断言问题.状态=='不可行'不返回打印(解决(2,数据))打印(解决(1,数据))打印(解决(3,数据))#[0 2]# 没有任何# 没有任何 

备注:

  • 代码使用了非常强大的cvxpy,但是缺少一些高级的整数编程支持
    • 唯一易于使用的非商业求解器是 GLPK ,这非常好,但通常无法与 Cbc
    • 竞争
    • cvxpy的代数用法以及一些接口决定导致不寻常的变量边界作为约束在这里表达

Suppose I have an (M x N) binary matrix where both M and N can be large. I want to find exactly k columns (k is relatively small, say less than 10) such that the sum of those k columns is the 1-vector (all elements are 1). One solution is adequate. Is there a fast algorithm for this?

For example, an algorithm working on the matrix

1 0 0
1 0 0
1 1 0
0 1 1

with k=2 should return columns 0 and 2, but should report no solutions if k=1 or k=3.

I've tried two approaches:

  1. The slow combinatorial approach where I try all (N choose k) combinations and find the combination that sums to the 1-vector. This runs in O(N^k) time which is obviously horrendous.
  2. A recursive approach, which is faster but still runs in O(N^k) worst-case time. The Python code is as below:

import numpy as np

def recursiveFn(mat, col_used_bool, col_sum_to_date, cols_to_go):
    N = len(mat)
    if cols_to_go == 1:
        col_unused = 1 - col_sum_to_date
        if list(col_unused) in [list(i) for i in mat]:
            return (True, [col_unused])
        else:
            return (False, None)
    for col_id in range(N):
        if col_used_bool[col_id]:
            continue
        if 2 not in mat[col_id]+col_sum_to_date:
            col_used_bool[col_id] = True
            x = recursiveFn(mat, col_used_bool, mat[col_id]+col_sum_to_date, cols_to_go-1)
            col_used_bool[col_id] = False
            if x[0]:
                return (True, x[1] + [mat[col_id]])
    return (False, None)

exMat = [[1,1,1,0],[0,0,1,1],[0,0,0,1]] #input by colums
exMat = [np.asarray(i) for i in exMat]
k = 2
output = recursiveFn(mat = exMat, col_used_bool = [False for i in exMat], 
    col_sum_to_date = np.asarray([0 for i in exMat[0]]), cols_to_go = k)
print(output[1])
### prints this : [array([0, 0, 0, 1]), array([1, 1, 1, 0])]

I'm unsatisfied with either of these approaches, and I feel that a smarter and faster algorithm exists. Thanks very much for your help. This is my first post on StackOverflow, so please be gentle with me if I made a faux-pas somewhere!

(If interested, I've also asked the same question on Math Stack Exchange, but there I'm less concerned about algorithmic efficiency and more concerned about mathematical techniques.)

解决方案

My first attempt would be integer-programming using one of the available high-performance solvers (e.g. Cbc).

Assuming some sparsity in your incidence-matrix, those will be very efficient and are quite general (side-constraints / adaptations). They are also complete and might be able to prove infeasibility.

A simple formulation would look like:

Instance

c0 c1 c2
1  0  0  r0
1  0  0  r1
1  1  0  r2
0  1  1  r3

IP:

minimize(0)        # constant objective | pure feasibility problem

sum(c_i) = k       # target of columns chosen

r0 = 1 = c0        # r0 just showing the origin of the constraint; no real variable!
r1 = 1 = c0
r2 = 1 = c0 + c1
r3 = 1 = c1 + c2

c_i in {0, 1}      # all variables are binary

It might be possible to strenghten this formulation by additional inequalities like clique-inequalities (conflict-graph -> maximal-cliques), but not sure if that helps. Good solvers will do something similar dynamically be generating cuts.

A lot of theory is available. One keyword would be exact cover or all those packing/covering problems which are very similar.

Simple code-example:

import cvxpy as cp
import numpy as np

data = np.array([[1, 0, 0],
                 [1, 0, 0],
                 [1, 1, 0],
                 [0, 1, 1]])

def solve(k, data):
  c = cp.Variable(data.shape[1], boolean=True)

  con = [data * c == 1,
         cp.sum(c) == k,
         c >= 0,
         c <= 1]

  obj = cp.Minimize(0)
  
  problem = cp.Problem(obj, con)
  problem.solve(verbose=True, solver=cp.GLPK_MI)

  if(problem.status == 'optimal'):
    return np.where(np.isclose(c.value, 1.0) == True)[0]
  else:
    assert problem.status == 'infeasible'
    return None

print(solve(2, data))
print(solve(1, data))
print(solve(3, data))

# [0 2]
# None
# None

Remarks:

  • The code uses cvxpy which is very powerful, but lacks some advanced integer-programming support
    • The only easy to use non-commercial solver is GLPK, which is very good, but usually cannot compete with Cbc
    • The very algebraic usage of cvxpy together with some interface-decisions lead to the unusual variable-bounds as constraints formulation here

这篇关于一种快速算法,用于在二进制矩阵中精确地找到k列,以使这些列的总和为1-vector的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆