文本对齐算法 [英] Text Justification Algorithm

查看:104
本文介绍了文本对齐算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是DP中一个非常著名的问题,有人可以帮助可视化它的递归部分。如何生成排列或组合。

This is a very famous problem in DP, Can somebody help to visualize the recursion part of it.How are the Permutations or Combinations will be generated.

问题参考。
https://www.geeksforgeeks.org/dynamic -programming-set-18-word-wrap /

推荐答案

给出最大线宽为L的想法证明文本T的正当性,是考虑文本的所有后缀(考虑单词而不是字符以形成精确的后缀。)
动态编程不过是谨慎的蛮力。
如果考虑采用蛮力方法,则需要执行以下操作。

Given the maximum line width as L, the idea to justify the Text T, is to consider all suffixes of the Text (consider words instead of characters for forming suffixes to be precise.) Dynamic Programming is nothing but "Careful Brute-force". If you consider the brute force approach, you need to do the following.


  1. 考虑放入1,2,.. n

  2. 对于情况1中描述的每种情况(假设i词放在第1行中),请考虑将1,2,.. n -i词放在第1行中。

相反,我们只考虑问题来找出成本在行首放置一个单词。
通常,我们可以将DP(i)定义为将第(i-1)个单词视为行的开头的成本。

Instead lets just consider the problem to find out the cost of putting a word at the beginning of a line. In general we can define DP(i) to be the cost for considering (i- 1)th word as the beginning of a Line.

如何为DP(i)形成递归关系?

如果第j个单词是下一行的开头,则当前行将包含words [i:j](j排他),而第j个单词作为下一行开头的开销将为DP(j)。
因此,DP(i)= DP(j)+在当前行中放入单词[i:j)的成本
由于我们要最小化总成本,因此可以定义DP(i)如下

If jth word is the beginning of the next line, then the current line will contain words[i:j) (j exclusive) and the cost of jth word being the beginning of the next line will be DP(j). Hence DP(i) = DP(j) + cost of putting words[i:j) in the current line As we want to minimise the total cost, DP(i) can be defined as follows.

重复关系:


DP( i)= min {DP(j)+在[i + 1,n]中对所有j放置单词[i:j在当前行中的成本}

DP(i) = min { DP(j) + cost of putting words[i:j in the current line } for all j in [i+1, n]

注意j = n表示下一行没有剩余的单词。

Note j = n signify that no words are left to be put in the next line.

基本情况:DP( n)= 0 =>此时,将没有任何单词可写。

The base Case: DP(n) = 0 => at this point there is no word left to be written.

总结一下:


  1. 子问题:后缀,单词[:i]

  2. 猜:在哪里开始下一行,选择项的数目n-i-> O(n)

  3. 重复出现:DP(i)=最小值{DP(j)+在当前行中放置单词[i:j)的成本}}
    如果我们使用记忆,则表达式内部大括号应该花O(1)时间,循环运行O(n)次(选择次数#)。
    i从n变化到0 =>因此总复杂度降低到O(n ^ 2)。

现在即使我们得出了证明文本的最低代价,我们也需要通过跟踪上面表达式中选择为最小值的j值来解决原始问题,以便以后可以使用它打印出证明文本。这个想法是保留父指针。

Now even though we derived the minimum cost for justifying the text, we also need to solve the original problem by keeping track of the j value for chosen as minimum in the above expression, so that we can later use the same to print out the justified text. The idea is of keeping parent pointer.

希望这可以帮助您理解解决方案。以下是上述想法的简单实现。

Hope this helps you understand the solution. Below is the simple implementation of the above idea.

 public class TextJustify {
    class IntPair {
        //The cost or badness
        final int x;

        //The index of word at the beginning of a line
        final int y;
        IntPair(int x, int y) {this.x=x;this.y=y;}
    }
    public List<String> fullJustify(String[] words, int L) {
        IntPair[] memo = new IntPair[words.length + 1];

        //Base case
        memo[words.length] = new IntPair(0, 0);


        for(int i = words.length - 1; i >= 0; i--) {
            int score = Integer.MAX_VALUE;
            int nextLineIndex = i + 1;
            for(int j = i + 1; j <= words.length; j++) {
                int badness = calcBadness(words, i, j, L);
                if(badness < 0 || badness == Integer.MAX_VALUE) break;
                int currScore = badness + memo[j].x;
                if(currScore < 0 || currScore == Integer.MAX_VALUE) break;
                if(score > currScore) {
                    score = currScore;
                    nextLineIndex = j;
                }
            }
            memo[i] = new IntPair(score, nextLineIndex);
        }

        List<String> result = new ArrayList<>();
        int i = 0;
        while(i < words.length) {
            String line = getLine(words, i, memo[i].y);
            result.add(line);
            i = memo[i].y;
        }
        return result;
    }

    private int calcBadness(String[] words, int start, int end, int width) {
        int length = 0;
        for(int i = start; i < end; i++) {
            length += words[i].length();
            if(length > width) return Integer.MAX_VALUE;
            length++;
        }
        length--;
        int temp = width - length;
        return temp * temp;
    }


    private String getLine(String[] words, int start, int end) {
        StringBuilder sb = new StringBuilder();
        for(int i = start; i < end - 1; i++) {
            sb.append(words[i] + " ");
        }
        sb.append(words[end - 1]);

        return sb.toString();
    }
  }

这篇关于文本对齐算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆