查找具有一定权重的所有二进制字符串,并尽可能快 [英] Find all binary strings of certain weight has fast as possible

查看:61
本文介绍了查找具有一定权重的所有二进制字符串,并尽可能快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想找到一定重量的二进制字符串.这样的字符串数量会增加到内存错误的程度,因此我目前正在使用生成器生成它们.该代码生成长度为k的所有长度为n的二进制字符串:

I want to find binary strings of a certain weight. The amount of such strings grows to the point of a memory error, so I'm currently generating them with a generator. This code generators all length n binary strings with weight k:

def kbits(n, k):
    for bits in itertools.combinations(range(n), k):
        s = ['0'] * n
        for bit in bits:
            s[bit] = '1'
        yield ''.join(s)

for b in kbits(length, weight):
    print(b)

因此,对于长度= 3和重量= 2,我们得到110、101、011.

So for length = 3 and weight = 2, we get 110, 101, 011.

我的研究要求我解析n = 56和k = 7之类的值,这在我的设备上大约需要24个小时.我还想尝试n = 72和k = 8,这可能需要365天(基于先前结果的时间).所以我想知道两件事:

My research requires me to parse through values such as n = 56 and k = 7, which takes around 24 hours on my device. I'd also like to try n = 72 and k = 8, which (based on the time of the previous result) may take 365 days. So I'm wondering two things:

  1. 这是生成这些二进制字符串的最快(非内存)密集型方法吗?

  1. Is this the quickest (non-memory) intensive way of generating these binary strings?

是否可以同时使用多个CPU内核?我假设itertools正在通过一个序列进行解析.如果(假设)我们有一个2核CPU,那么是否有可能让第一个核解析序列的前50%,而让第二个核解析后一半?

Is it possible to have multiple cores of my CPU working on this at once? I'm assuming itertools is parsing through a sequence. If (let's say) we had a 2-core CPU, would it be possible to have the first core parse the first 50% of the sequence and the second core to do the latter half?

也许我应该提到,对于每个布尔b,我想执行以下最小二乘法计算,其中N是一些定义的矩阵:

Perhaps I should mention that for each boolean b, I'd like to perform the following least-squares computation, where N is some defined matrix:

for b in kbits(size, max_coclique):
    v = np.linalg.lstsq(N,np.array(list(b), dtype = float))

即我要求b的最终预期输出格式是具有0/1值的numpy数组. (那是除非有某种非常快的方式来完成所有这些操作,包括以最小二乘法进行计算).

i.e. I require the ultimate expected output format for b to be a numpy array with 0/1 values. (That is unless there is some extremely fast way of doing all of this - including the least-squares computation - in a different way.)

注意:我也在Sage中运行此程序,因为我正在利用它的传递组数据库.

Note: I'm also running this in Sage, as I am utilizing its database of transitive groups.

推荐答案

给出一个权重为 k 的值,您可以按如下方式获得词法上的下一个值:

Given a value with weight k, you can get the lexically next value as follows:

  1. 在最右边1的左侧找到最右边的0.
  2. 从右移1到那个0
  3. 将所有其他1移到该零的右侧,并尽可能右移.

这是Pandita算法的二进制版本: https://en.wikipedia.org /wiki/Permutation#Generation_in_lexicographic_order

This is the binary version of the Pandita algorithm: https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographic_order

您可以通过如下操作来做到这一点:

You can do it with bit manipulations like this:

def kbits(n, k):
    limit=1<<n
    val=(1<<k)-1
    while val<limit:
        yield "{0:0{1}b}".format(val,n)
        minbit=val&-val #rightmost 1 bit
        fillbit = (val+minbit)&~val  #rightmost 0 to the left of that bit
        val = val+minbit | (fillbit//(minbit<<1))-1

可能还有一些优化机会,但是时间将由在yield语句中将值格式化为二进制字符串来支配.

There are probably some opportunities for optimization remaining, but the time will be dominated by formatting the values as binary strings in the yield statement.

这篇关于查找具有一定权重的所有二进制字符串,并尽可能快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆