字母尽可能大的矩形 [英] largest possible rectangle of letters

查看:208
本文介绍了字母尽可能大的矩形的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  

写程序找出的字母的最大可能的矩形,使得每行构成一个字(从左到右)和每列形成一个字(从上到下)。

我发现了这个有趣的问题。这不是功课,虽然它听起来如此。 (我不是在学校)。我这样做是为了好玩。

  

示例

     

API 代表提示的,我们得到以下的矩形(这是一个方形):

 ÇA R
猿
小费
 

我最初的想法是建立一个某种形式的一个preFIX树这样我就可以检索开始使用特定字符串的所有单词。当我们已经有2个或更多的话(或者读从上到下或从左到右)这将是有益的,我们需要找到下一个单词来添加。

任何其他的想法?

修改

难道这是一个长方体(3D矩形)做了什么?

如果有什么需要有有效的话就对角线(理念信用:user645466);怎么会在算法中为它进行优化?

解决方案

设D是字典。修复m和n。我们可以公式化找出一个m&倍的问题; ñ矩形作为约束满足问题(CSP)。

  

X <子> I,1 &hellip,X <子> I,N &ISIN代码; D&NBSP;&NBSP;&NBSP;&NBSP;&FORALL; I&ISIN代码; {1,&hellip;,米}
  x <子> 1,J &hellip,X <子>米,J &ISIN代码; D&NBSP;&NBSP;&NBSP;&NBSP;&FORALL; J&ISIN代码; {1,&hellip;,N}
  x <子> I,J &ISIN代码; {A,&hellip,Z}&NBSP;&NBSP;&NBSP;&NBSP;&FORALL; I&ISIN代码; {1,&hellip,M},与FORALL; J&ISIN代码; {1,&hellip;,N}

解决电信运营商一个非常普遍的做法是结合约束传播回溯。严格地说,回溯我们挑一个变量的手段,猜测它的价值,并递归地解决子问题与少一个变量,约束传播手段试图减少的可能性的数量为每个变量(可能是零,这意味着没有解决方案)。

作为一个例子,我们可以开始一个3次; 3格通过选择X <子> 1.1 = 问:

  Q 20
???
???
 

使用英文词典,x的唯一的可能性<子> 1,2 和X <子> 2,1 是 U (在拼字游戏&ldquo之前;也就是说&rdquo;的)。

解决电信运营商的艺术回溯和约束传播之间的平衡。如果我们没有在所有传播的限制,那么我们只是使用蛮力。如果我们完全传播的限制,那么我们就需要原路返回,而是一个传播算法,通过自身解决的NP问题可能是相当昂贵的运行。

在此问题,用一个大字典,每个回溯节点的工作会得到昂贵的,除非我们有很好的数据结构的支持。我将概述了使用线索或的 DAWG 快速计算的字母集,通过该preFIX延伸到一个完整的字。

在每个回溯节点,集我们已经赋值的变量是一个年轻的画面。换句话说,没有变量分配,直到它的左上方和变量被分配。在下图中,表示分配的变量, * 表示未分配的变量。

  ..... *
... * ??
... * ??
.. * ???
* ?????
 

变量标记 * 的候选人将被分配一个值下。具有多个选择,而每次选择一个固定变量的优点是,一些可变排序可以是优于其他

对于每个 * ,使两个查找到线索/ DAWG,一个水平,一个是垂直的,计算的字母集合的交集,可以来到旁边。一个经典的策略是选择与希望最少的可能性,我们得出一个矛盾快变量。我喜欢这个策略,因为它修剪自然当有零的可能性变和自然传播时,有一个变量。通过缓存的结果,我们可以评估每个节点的速度非常快。

Write a program to find the largest possible rectangle of letters such that every row forms a word (left to right) and every column forms a word (top to bottom).

I found this interesting question. It's not homework, though it may sound as such. (I'm not in school). I'm doing this for fun.

Example

From cat, car, ape, api, rep, tip we get the following rectangle (which is a square):

c a r
a p e
t i p

My initial idea is to build a some sort of a prefix tree so I can retrieve all words that start with a specific string. This would be useful when we already have 2 or more words (either reading top to bottom or left to right) and we need to find the next word to add.

Any other ideas?

Edit

Could this be done with a cuboid (3D rectangle)?

What if it needs to have valid words on the diagonals (idea credit: user645466); how would the algo for it be optimized?

解决方案

Let D be the dictionary. Fix m and n. We can formulate the problem of finding an m × n rectangle as a constraint satisfaction problem (CSP).

xi,1…xi,n ∈ D    ∀i ∈ {1, …, m}
x1,j…xm,j ∈ D    ∀j ∈ {1, …, n}
xi,j ∈ {A, …, Z}    ∀i ∈ {1, …, m}, ∀j ∈ {1, …, n}

A very common approach for solving CSPs is to combine backtracking with constraint propagation. Loosely speaking, backtracking means we pick a variable, guess its value, and recursively solve the subproblem with one fewer variable, and constraint propagation means trying to reduce the number of possibilities for each variable (possibly to zero, which means there's no solution).

As an example, we might start a 3 × 3 grid by choosing x1,1 = Q.

Q??
???
???

With an English dictionary, the only possibility for x1,2 and x2,1 is U (in before Scrabble “words”).

The art of solving CSPs is balancing between backtracking and constraint propagation. If we don't propagate constraints at all, then we're just using brute force. If we propagate constraints perfectly, then we don't need to backtrack, but a propagation algorithm that solves an NP-hard problem by itself is probably rather expensive to run.

In this problem, working with a large dictionary at each backtracking node will get expensive unless we have good data structure support. I'll outline an approach that uses a trie or a DAWG quickly to compute the set of letters via which a prefix extends to a complete word.

At each backtracking node, the set of variables we have assigned is a Young tableau. In other words, no variable is assigned until the variables above it and to the left have been assigned. In the diagram below, . denotes an assigned variable and * and ? denote unassigned variables.

.....*
...*??
...*??
..*???
*?????

The variables marked * are candidates for the next to be assigned a value. The advantage of having multiple choices rather choosing a fixed variable each time is that some variable orderings can be much better than others.

For each *, make two lookups into the trie/DAWG, one for the horizontal and one for the vertical, and compute the intersection of the sets of letters that can come next. One classic strategy is to choose the variable with the fewest possibilities in the hope that we reach a contradiction faster. I like this strategy because it prunes naturally when there's a variable with zero possibilities and propagates naturally when there's a variable with one. By caching results, we can make evaluating each node very fast.

这篇关于字母尽可能大的矩形的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆