N个随机,连续和不重叠的子序列,每个序列的长度 [英] N random, contiguous and non-overlapping subsequences each of length

查看:113
本文介绍了N个随机,连续和不重叠的子序列,每个序列的长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取序列的n个随机且不重叠的切片,其中每个子序列的长度最好为l,最好按出现的顺序.

I'm trying to get n random and non-overlapping slices of a sequence where each subsequence is of length l, preferably in the order they appear.

这是我到目前为止的代码,每次尝试使其工作都变得越来越混乱,不用说它行不通.

This is the code I have so far and it's gotten more and more messy with each attempt to make it work, needless to say it doesn't work.

def rand_parts(seq, n, l):
    """
    return n random non-overlapping partitions each of length l.
    If n * l > len(seq) raise error.
    """
    if n * l > len(seq):
        raise Exception('length of seq too short for given n, l arguments')
    if not isinstance(seq, list):
        seq = list(seq)
    gaps = [0] * (n + 1)
    for g in xrange(len(seq) - (n * l)):
        gaps[random.randint(0, len(gaps) - 1)] += 1
    result = []
    for i, g in enumerate(gaps):
        x = g + (i * l)
        result.append(seq[x:x+l])
        if i < len(gaps) - 1:
            gaps[i] += x
    return result

例如,如果我们说rand_parts([1, 2, 3, 4, 5, 6], 2, 2),则可能从下图返回6种可能的结果:

For example if we say rand_parts([1, 2, 3, 4, 5, 6], 2, 2) there are 6 possible results that it could return from the following diagram:

[1, 2, 3, 4, 5, 6]
 ____  ____

[1, 2, 3, 4, 5, 6]
 ____     ____ 

[1, 2, 3, 4, 5, 6]
 ____        ____ 

[1, 2, 3, 4, 5, 6]
    ____  ____ 

[1, 2, 3, 4, 5, 6]
    ____     ____ 

[1, 2, 3, 4, 5, 6]
       ____  ____

因此[[3, 4], [5, 6]]是可以接受的,但[[3, 4], [4, 5]]不会因为它重叠而[[2, 4], [5, 6]]也不会因为[2, 4]不连续.

So [[3, 4], [5, 6]] would be acceptable but [[3, 4], [4, 5]] wouldn't because it's overlapping and [[2, 4], [5, 6]] also wouldn't because [2, 4] isn't contiguous.

我在打高尔夫球的时候遇到了这个问题,所以出于兴趣的考虑,很高兴看到一个简单的解决方案和/或一个有效的解决方案,而对我现有的代码不那么感兴趣.

I encountered this problem while doing a little code golfing so for interests sake it would also be nice to see both a simple solution and/or an efficient one, not so much interested in my existing code.

推荐答案

def rand_parts(seq, n, l):
    indices = xrange(len(seq) - (l - 1) * n)
    result = []
    offset = 0
    for i in sorted(random.sample(indices, n)):
        i += offset
        result.append(seq[i:i+l])
        offset += l - 1
    return result

要了解这一点,请首先考虑情况l == 1.然后,基本上只是按排序顺序返回输入数据的random.sample();在这种情况下,offset变量始终为0.

To understand this, first consider the case l == 1. Then it's basically just returning a random.sample() of the input data in sorted order; in this case the offset variable is always 0.

l > 1是前一种情况的扩展的情况.我们使用random.sample()来拾取位置,但是使用offset来移位连续的结果:这样,我们确保它们是不重叠的范围-即它们的起始距离至少为l而不是1.

The case where l > 1 is an extension of the previous case. We use random.sample() to pick up positions, but maintain an offset to shift successive results: in this way, we make sure that they are non-overlapping ranges --- i.e. they start at a distance of at least l of each other, rather than 1.

这篇关于N个随机,连续和不重叠的子序列,每个序列的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆