将文件合并到大小相等的目录中 [英] binning files into approximately equal sized directories

查看:77
本文介绍了将文件合并到大小相等的目录中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出格式的文件名的排序列表

Given a sorted list of filenames of the form

{artist}-{title}.mp3

我想将文件分发到255个bin(子目录)中,以使每个bin包含大约相等数量的文件,并且限制艺术家是原子的",即,没有艺术家应该在一个以上的目录中分发曲目.结果也应保持排序(即忽略合并,我们仍然具有与列表相同的顺序).

I want to distribute the files into 255 bins (subdirectories) so that each bin contains approximately equal numbers of files, with the restriction that artists are "atomic" i.e. that no artist should have tracks distributed over more than one directory. The result should remain sorted aswell (i.e. ignoring the binning, we still have the same ordering of the list).

我已经尝试过的方法:通过这种方法,我将列表分为255个部分:

What I've already tried: I partition the list into exactly 255 parts by this method:

def partition(lst, n):
    q, r = divmod(len(lst), n)
    indices = [q * i + min(i, r) for i in range(n + 1)]
    result = [lst[indices[i]:indices[i + 1]] for i in range(n)]
    assert sum(len(x) for x in result) == len(lst)
    assert len(set(len(x) for x in result)) <= 2
    return result

然后我通过将艺术家移至上一个垃圾箱(如果他们已经有另一首曲目)的方式来强制执行限制,即艺术家是原子的.这种方法是次优且无效的,因为我留下了很多空箱子(由于在某些情况下,同一位艺术家拥有许多曲目)

And then I go through and enforce the restriction that artists are atomic by moving them into the previous bin if they already have another track there. This method is sub-optimal and broken, because I'm left with a lot of empty bins (due to having, in some cases, many tracks by same artist)

推荐答案

在破解了一个小时之后,我想到了一个更好的解决方案.它的工作原理是创建要跟踪的音乐人dict,然后通过在相邻条目很小时贪婪地配对它们,将其降低到255个键.

After hacking away at it for an hour, here's a better solution I came up with. It works by creating a dict of artist to tracks, and then whittling it down to 255 keys by greedily pairing up adjacent entries when they are small.

我确信它仍然不是最佳的,但是它是可行的,如果有人有类似的问题需要解决,我将在这里进行复制:

I'm sure it's still not optimal, but it's workable and I'll reproduce it here in case anyone has a similar problem to solve:

d = collections.OrderedDict()
# build an OrderedDict of artist to track(s)
for fn in files:
    artist, title = fn.split('-')
    if (artist,) not in d:
        d[artist,] = []
    d[artist,].append(fn)

def window(seq, n=2):
    it = iter(seq)
    result = tuple(itertools.islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result

while len(d) > 255:
    # find the pair of adjacent keys with minimal number of files contained
    k1, k2 = min(window(d), key=lambda x: len(d[x[0]] + d[x[1]]))
    # join them into the first key and nuke the latter
    d[k1] += d.pop(k2)

这篇关于将文件合并到大小相等的目录中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆