拆分长字符串而不会破坏单词填充线 [英] Splitting long string without breaking words fulfilling lines

查看:123
本文介绍了拆分长字符串而不会破坏单词填充线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在您认为它是重复的之前(有很多问题问如何在不破坏单词的情况下拆分长字符串)要记住我的问题有点不同:顺序并不重要,我必须适合这些单词为了尽可能多地使用每一行.

我有一组无序的单词,我想将它们组合起来而又不使用超过253个字符.

I've a unordered set of words and I want to combine them without using more than 253 characters.

def compose(words):
    result = " ".join(words)
    if len(result) > 253:
        pass # this should not happen!
    return result

我的问题是我想尝试尽可能多地填充行.例如:

My problem is that I want to try to fill the line as much as possible. For example:

words = "a bc def ghil mno pq r st uv"
limit = 5 # max 5 characters

# This is good because it's the shortest possible list,
#   but I don't know how could I get it
# Note: order is not important
good = ["a def", "bc pq", "ghil", "mno r", "st uv"]

# This is bad because len(bad) > len(good)
#   even if the limit of 5 characters is respected
# This is equivalent to:
#   bad  = ["a bc", "def", "ghil", "mno", "pq r", "st uv"]
import textwrap
bad = textwrap.wrap(words, limit)

我该怎么办?

推荐答案

非最优离线快速一维bin打包Python算法

def binPackingFast(words, limit, sep=" "):
    if max(map(len, words)) > limit:
        raise ValueError("limit is too small")
    words.sort(key=len, reverse=True)
    res, part, others = [], words[0], words[1:]
    for word in others:
        if len(sep)+len(word) > limit-len(part):
            res.append(part)
            part = word
        else:
            part += sep+word
    if part:
        res.append(part)
    return res

性能

通过/usr/share/dict/words (由 words-3.0-20.fc18.noarch 提供)进行测试,它可以在一秒钟内完成50万个单词我的慢速双核笔记本电脑,使用以下参数的效率至少为90%:

Tested over /usr/share/dict/words (provided by words-3.0-20.fc18.noarch) it can do half million words in a second on my slow dual core laptop, with an efficiency of at least 90% with those parameters:

limit = max(map(len, words))
sep = ""

使用 limit * = 1.5 ,我得到92%,使用 limit * = 2 ,我得到96%(相同的执行时间).

With limit *= 1.5 I get 92%, with limit *= 2 I get 96% (same execution time).

最优(理论)值的计算方法如下: math.ceil(len(sep.join(words))/limit)

Optimal (theoretical) value is calculated with: math.ceil(len(sep.join(words))/limit)

不能保证有效率的装箱算法做得更好

no efficient bin-packing algorithm can be guaranteed to do better

来源: http://mathworld.wolfram.com/Bin-PackingProblem.html

故事的寓意

虽然找到最佳解决方案很有趣,但我认为在大多数情况下,将这种算法用于一维脱机装箱问题会更好.

While it's interesting to find the best solution, I think that for the most cases it would be much better to use this algorithm for 1D offline bin packing problems.

资源

注释

这篇关于拆分长字符串而不会破坏单词填充线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆