拆分长字符串而不会破坏单词填充线 [英] Splitting long string without breaking words fulfilling lines
问题描述
在您认为它是重复的之前(有很多问题问如何在不破坏单词的情况下拆分长字符串)要记住我的问题有点不同:顺序并不重要,我必须适合这些单词为了尽可能多地使用每一行.
我有一组无序的单词,我想将它们组合起来而又不使用超过253个字符.
I've a unordered set of words and I want to combine them without using more than 253 characters.
def compose(words):
result = " ".join(words)
if len(result) > 253:
pass # this should not happen!
return result
我的问题是我想尝试尽可能多地填充行.例如:
My problem is that I want to try to fill the line as much as possible. For example:
words = "a bc def ghil mno pq r st uv"
limit = 5 # max 5 characters
# This is good because it's the shortest possible list,
# but I don't know how could I get it
# Note: order is not important
good = ["a def", "bc pq", "ghil", "mno r", "st uv"]
# This is bad because len(bad) > len(good)
# even if the limit of 5 characters is respected
# This is equivalent to:
# bad = ["a bc", "def", "ghil", "mno", "pq r", "st uv"]
import textwrap
bad = textwrap.wrap(words, limit)
我该怎么办?
推荐答案
非最优离线快速一维bin打包Python算法
def binPackingFast(words, limit, sep=" "):
if max(map(len, words)) > limit:
raise ValueError("limit is too small")
words.sort(key=len, reverse=True)
res, part, others = [], words[0], words[1:]
for word in others:
if len(sep)+len(word) > limit-len(part):
res.append(part)
part = word
else:
part += sep+word
if part:
res.append(part)
return res
性能
通过/usr/share/dict/words
(由 words-3.0-20.fc18.noarch
提供)进行测试,它可以在一秒钟内完成50万个单词我的慢速双核笔记本电脑,使用以下参数的效率至少为90%:
Tested over /usr/share/dict/words
(provided by words-3.0-20.fc18.noarch
) it can do half million words in a second on my slow dual core laptop, with an efficiency of at least 90% with those parameters:
limit = max(map(len, words))
sep = ""
使用 limit * = 1.5
,我得到92%,使用 limit * = 2
,我得到96%(相同的执行时间).
With limit *= 1.5
I get 92%, with limit *= 2
I get 96% (same execution time).
最优(理论)值的计算方法如下: math.ceil(len(sep.join(words))/limit)
Optimal (theoretical) value is calculated with: math.ceil(len(sep.join(words))/limit)
不能保证有效率的装箱算法做得更好
no efficient bin-packing algorithm can be guaranteed to do better
来源: http://mathworld.wolfram.com/Bin-PackingProblem.html
故事的寓意
虽然找到最佳解决方案很有趣,但我认为在大多数情况下,将这种算法用于一维脱机装箱问题会更好.
While it's interesting to find the best solution, I think that for the most cases it would be much better to use this algorithm for 1D offline bin packing problems.
资源
注释
- 我没有在我的实现中使用textwrap,因为它比我的简单Python代码慢.可能与以下内容有关:为什么是textwrap.wrap()和textwrap.fill()这么慢?
- 即使排序不颠倒,它似乎也能很好地工作.
这篇关于拆分长字符串而不会破坏单词填充线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!