将数字列表拆分为n个块,以使这些块具有(接近)相等的总和并保持原始顺序 [英] Split a list of numbers into n chunks such that the chunks have (close to) equal sums and keep the original order

查看:98
本文介绍了将数字列表拆分为n个块,以使这些块具有(接近)相等的总和并保持原始顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这不是标准的分区问题,因为我需要保持列表中元素的顺序。

This is not the standard partitioning problem, as I need to maintain the order of elements in the list.

例如,如果我有一个列表

So for example if I have a list

[1, 6, 2, 3, 4, 1, 7, 6, 4]

,而我想要两个块,则拆分应为

and I want two chunks, then the split should give

[[1, 6, 2, 3, 4, 1], [7, 6, 4]] 

,每边总计17。对于三个块,结果将为

for a sum of 17 on each side. For three chunks the result would be

[[1, 6, 2, 3], [4, 1, 7], [6, 4]]

总计12、12和10。

for sums of 12, 12, and 10.

编辑以获取其他说明

我目前将总和除以块数并将其用作一个目标,然后迭代直到我接近那个目标。问题在于某些数据集可能会使算法混乱,例如,尝试将以下内容分为3:-

I currently divide the sum with the number of chunks and use that as a target, then iterate till I get close to that target. The problem is that certain data sets can mess the algorithm up, for example trying to divide the following into 3:-

[95, 15, 75, 25, 85, 5]

总和为300,目标为100。第一个块将总计为95,第二个将总计为90,第三个将总计为110,并且5将为剩余。将其追加到应有的位置将得到95、90、115,而更合理的解决方案应为110、100、90。

Sum is 300, target is 100. The first chunk would sum to 95, second would be sum to 90, third would sum to 110, and 5 would be 'leftover'. Appending it where it's supposed to be would give 95, 90, 115, where a more 'reasonable' solution would be 110, 100, 90.

结束编辑

背景:

我有一个包含高度不同的文本(歌词)的列表,我想将文本分成任意数量的列。目前,我是根据所有行的总高度计算目标高度的,但显然这是一个始终被低估的情况,在某些情况下,这会导致最优解(最后一列明显更高)。

I have a list containing text (song lyrics) of varying heights, and I want to divide the text into an arbitrary number of columns. Currently I calculate a target height based on the total height of all lines, but obviously this is a consistent underestimate, which in some cases results in a suboptimal solution (the last column is significantly taller).

推荐答案

此方法定义分区边界,该边界将数组划分为大致相等数量的元素,然后反复搜索更好的分区,直到找不到为止。它与大多数其他已发布的解决方案的不同之处在于,它通过尝试多个不同的分区来寻找最佳解决方案。其他解决方案试图通过数组的一次通过来创建一个良好的分区,但是我想不出保证最优的一次通过算法。

This approach defines partition boundaries that divide the array in roughly equal numbers of elements, and then repeatedly searches for better partitionings until it can't find any more. It differs from most of the other posted solutions in that it looks to find an optimal solution by trying multiple different partitionings. The other solutions attempt to create a good partition in a single pass through the array, but I can't think of a single pass algorithm that's guaranteed optimal.

此处的代码是此算法的有效实现,但可能很难理解,因此在末尾包含了更具可读性的版本。

The code here is an efficient implementation of this algorithm, but it can be hard to understand so a more readable version is included as an addendum at the end.

def partition_list(a, k):
    if k <= 1: return [a]
    if k >= len(a): return [[x] for x in a]
    partition_between = [(i+1)*len(a)/k for i in range(k-1)]
    average_height = float(sum(a))/k
    best_score = None
    best_partitions = None
    count = 0

    while True:
        starts = [0]+partition_between
        ends = partition_between+[len(a)]
        partitions = [a[starts[i]:ends[i]] for i in range(k)]
        heights = map(sum, partitions)

        abs_height_diffs = map(lambda x: abs(average_height - x), heights)
        worst_partition_index = abs_height_diffs.index(max(abs_height_diffs))
        worst_height_diff = average_height - heights[worst_partition_index]

        if best_score is None or abs(worst_height_diff) < best_score:
            best_score = abs(worst_height_diff)
            best_partitions = partitions
            no_improvements_count = 0
        else:
            no_improvements_count += 1

        if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
            return best_partitions
        count += 1

        move = -1 if worst_height_diff < 0 else 1
        bound_to_move = 0 if worst_partition_index == 0\
                        else k-2 if worst_partition_index == k-1\
                        else worst_partition_index-1 if (worst_height_diff < 0) ^ (heights[worst_partition_index-1] > heights[worst_partition_index+1])\
                        else worst_partition_index
        direction = -1 if bound_to_move < worst_partition_index else 1
        partition_between[bound_to_move] += move * direction

def print_best_partition(a, k):
    print 'Partitioning {0} into {1} partitions'.format(a, k)
    p = partition_list(a, k)
    print 'The best partitioning is {0}\n    With heights {1}\n'.format(p, map(sum, p))

a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2) 
print_best_partition(a, 3)
print_best_partition(a, 4)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

可能要根据您的操作进行一些修改。例如,要确定是否已找到最佳分区,该算法将在分区之间没有高度差异时停止,它找不到比连续5次或100次迭代后看到的最佳结果更好的东西总迭代作为万能的停止点。您可能需要调整这些常数或使用其他方案。如果您的身高构成了复杂的价值观,那么知道何时停止就可能陷入试图逃脱局部最大值之类的经典问题。

There may be some modifications to make depending on what you are doing with this. For example, to determine whether the best partitioning has been found, this algorithm stops when there is no height difference among partitions, it doesn't find anything better than the best thing it's seen for more than 5 iterations in a row, or after 100 total iterations as a catch-all stopping point. You may need to adjust those constants or use a different scheme. If your heights form a complex landscape of values, knowing when to stop can get into classic problems of trying to escape local maxima and things like that.

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 1 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1, 7, 6, 4]]
With heights [34]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 2 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1], [7, 6, 4]]
With heights [17, 17]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 3 partitions
The best partitioning is [[1, 6, 2, 3], [4, 1, 7], [6, 4]]
With heights [12, 12, 10]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 4 partitions
The best partitioning is [[1, 6], [2, 3, 4], [1, 7], [6, 4]]
With heights [7, 9, 8, 10]

Partitioning [1, 10, 10, 1] into 2 partitions
The best partitioning is [[1, 10], [10, 1]]
With heights [11, 11]

Partitioning [7, 17, 17, 1, 8, 8, 12, 0, 10, 20, 17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9, 12, 3, 18, 9, 6, 7, 19, 20, 17, 7, 4, 3, 16, 20, 6, 7, 12, 16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16, 14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5, 13, 16, 0, 16, 7, 3, 8, 1, 20, 16, 11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18, 20, 3, 10, 9, 13, 12, 15, 6, 14, 16, 6, 12, 9, 9, 16, 14, 19, 1] into 10 partitions
The best partitioning is [[7, 17, 17, 1, 8, 8, 12, 0, 10, 20], [17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9], [12, 3, 18, 9, 6, 7, 19, 20], [17, 7, 4, 3, 16, 20, 6, 7, 12], [16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16], [14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5], [13, 16, 0, 16, 7, 3, 8, 1, 20, 16], [11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18], [20, 3, 10, 9, 13, 12, 15, 6, 14], [16, 6, 12, 9, 9, 16, 14, 19, 1]]
With heights [100, 95, 94, 92, 90, 87, 100, 93, 102, 102]

Partitioning [95, 15, 75, 25, 85, 5] into 3 partitions
The best partitioning is [[95, 15], [75, 25], [85, 5]]
With heights [110, 100, 90]



编辑



添加了新的测试用例[95,15 ,75、25、85、5],此方法可以正确处理。

Edit

Added the new test case, [95, 15, 75, 25, 85, 5], which this method handles correctly.

此版本该算法更易于阅读和理解,但由于较少利用内置Python功能而使算法更长。

This version of the algorithm is easier to read and understand, but is a bit longer due to taking less advantage of built-in Python features. It seems to execute in a comparable or even slightly faster amount of time, however.

#partition list a into k partitions
def partition_list(a, k):
    #check degenerate conditions
    if k <= 1: return [a]
    if k >= len(a): return [[x] for x in a]
    #create a list of indexes to partition between, using the index on the
    #left of the partition to indicate where to partition
    #to start, roughly partition the array into equal groups of len(a)/k (note
    #that the last group may be a different size) 
    partition_between = []
    for i in range(k-1):
        partition_between.append((i+1)*len(a)/k)
    #the ideal size for all partitions is the total height of the list divided
    #by the number of paritions
    average_height = float(sum(a))/k
    best_score = None
    best_partitions = None
    count = 0
    no_improvements_count = 0
    #loop over possible partitionings
    while True:
        #partition the list
        partitions = []
        index = 0
        for div in partition_between:
            #create partitions based on partition_between
            partitions.append(a[index:div])
            index = div
        #append the last partition, which runs from the last partition divider
        #to the end of the list
        partitions.append(a[index:])
        #evaluate the partitioning
        worst_height_diff = 0
        worst_partition_index = -1
        for p in partitions:
            #compare the partition height to the ideal partition height
            height_diff = average_height - sum(p)
            #if it's the worst partition we've seen, update the variables that
            #track that
            if abs(height_diff) > abs(worst_height_diff):
                worst_height_diff = height_diff
                worst_partition_index = partitions.index(p)
        #if the worst partition from this run is still better than anything
        #we saw in previous iterations, update our best-ever variables
        if best_score is None or abs(worst_height_diff) < best_score:
            best_score = abs(worst_height_diff)
            best_partitions = partitions
            no_improvements_count = 0
        else:
            no_improvements_count += 1
        #decide if we're done: if all our partition heights are ideal, or if
        #we haven't seen improvement in >5 iterations, or we've tried 100
        #different partitionings
        #the criteria to exit are important for getting a good result with
        #complex data, and changing them is a good way to experiment with getting
        #improved results
        if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
            return best_partitions
        count += 1
        #adjust the partitioning of the worst partition to move it closer to the
        #ideal size. the overall goal is to take the worst partition and adjust
        #its size to try and make its height closer to the ideal. generally, if
        #the worst partition is too big, we want to shrink the worst partition
        #by moving one of its ends into the smaller of the two neighboring
        #partitions. if the worst partition is too small, we want to grow the
        #partition by expanding the partition towards the larger of the two
        #neighboring partitions
        if worst_partition_index == 0:   #the worst partition is the first one
            if worst_height_diff < 0: partition_between[0] -= 1   #partition too big, so make it smaller
            else: partition_between[0] += 1   #partition too small, so make it bigger
        elif worst_partition_index == len(partitions)-1: #the worst partition is the last one
            if worst_height_diff < 0: partition_between[-1] += 1   #partition too small, so make it bigger
            else: partition_between[-1] -= 1   #partition too big, so make it smaller
        else:   #the worst partition is in the middle somewhere
            left_bound = worst_partition_index - 1   #the divider before the partition
            right_bound = worst_partition_index   #the divider after the partition
            if worst_height_diff < 0:   #partition too big, so make it smaller
                if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]):   #the partition on the left is bigger than the one on the right, so make the one on the right bigger
                    partition_between[right_bound] -= 1
                else:   #the partition on the left is smaller than the one on the right, so make the one on the left bigger
                    partition_between[left_bound] += 1
            else:   #partition too small, make it bigger
                if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the left smaller
                    partition_between[left_bound] -= 1
                else:   #the partition on the left is smaller than the one on the right, so make the one on the right smaller
                    partition_between[right_bound] += 1

def print_best_partition(a, k):
    #simple function to partition a list and print info
    print '    Partitioning {0} into {1} partitions'.format(a, k)
    p = partition_list(a, k)
    print '    The best partitioning is {0}\n    With heights {1}\n'.format(p, map(sum, p))

#tests
a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2) 
print_best_partition(a, 3)
print_best_partition(a, 4)
print_best_partition(a, 5)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

这篇关于将数字列表拆分为n个块,以使这些块具有(接近)相等的总和并保持原始顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆