算法文本划分为3均匀大小的组 [英] Algorithm to divide text into 3 evenly-sized groups

查看：172 发布时间：2015/11/30 15:27:17 algorithm sorting

本文介绍了算法文本划分为3均匀大小的组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想创建一种算法，将文本划分为3个，大小均匀组（基于文本的长度）。因为这将被投入使用的行断，需要的文字，以保持

I'm would like to create an algorithm that will divide text into 3-evenly sized groups (based on text length). Since this will be put to use for line-breaks, the order of the text needs to be maintained.

比如字符串：

Just testing to see how this works.

将排序为：

Just testing   // 12 characters
to see how     // 10 characters
this works.    // 11 characters

任何想法？

推荐答案

最小毛糙动态程序，还从自动换行了维基百科的文章，可以适应您的需求。设置线宽= LEN（文本）/ N - 1 ，并忽略有关超过线宽无限的处罚意见;使用 C（I，J）为与 P = 2 。

The "minimum raggedness" dynamic program, also from the Wikipedia article on word wrap, can be adapted to your needs. Set LineWidth = len(text)/n - 1 and ignore the comment about infinite penalties for exceeding the line width; use the definition of c(i, j) as is with P = 2.

Code. I took the liberty of modifying the DP always to return exactly n lines, at the cost of increasing the running time from O(#words ** 2) to O(#words ** 2 * n).

def minragged(text, n=3):
    """
    >>> minragged('Just testing to see how this works.')
    ['Just testing', 'to see how', 'this works.']
    >>> minragged('Just testing to see how this works.', 10)
    ['', '', 'Just', 'testing', 'to', 'see', 'how', 'this', 'works.', '']
    """
    words = text.split()
    cumwordwidth = [0]
    # cumwordwidth[-1] is the last element
    for word in words:
        cumwordwidth.append(cumwordwidth[-1] + len(word))
    totalwidth = cumwordwidth[-1] + len(words) - 1  # len(words) - 1 spaces
    linewidth = float(totalwidth - (n - 1)) / float(n)  # n - 1 line breaks
    def cost(i, j):
        """
        cost of a line words[i], ..., words[j - 1] (words[i:j])
        """
        actuallinewidth = max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i])
        return (linewidth - float(actuallinewidth)) ** 2
    # best[l][k][0] is the min total cost for words 0, ..., k - 1 on l lines
    # best[l][k][1] is a minimizing index for the start of the last line
    best = [[(0.0, None)] + [(float('inf'), None)] * len(words)]
    # xrange(upper) is the interval 0, 1, ..., upper - 1
    for l in xrange(1, n + 1):
        best.append([])
        for j in xrange(len(words) + 1):
            best[l].append(min((best[l - 1][k][0] + cost(k, j), k) for k in xrange(j + 1)))
    lines = []
    b = len(words)
    # xrange(upper, 0, -1) is the interval upper, upper - 1, ..., 1
    for l in xrange(n, 0, -1):
        a = best[l][b][1]
        lines.append(' '.join(words[a:b]))
        b = a
    lines.reverse()
    return lines

if __name__ == '__main__':
    import doctest
    doctest.testmod()

这篇关于算法文本划分为3均匀大小的组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

算法文本划分为3均匀大小的组 [英] Algorithm to divide text into 3 evenly-sized groups

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

算法文本划分为3均匀大小的组 [英] Algorithm to divide text into 3 evenly-sized groups

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭