拆分一个字符串的有效字使用动态规划字符串 [英] Split a string to a string of valid words using Dynamic Programming

查看:591
本文介绍了拆分一个字符串的有效字使用动态规划字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要找到一个动态规划算法来解决这个问题。我试过,但不能看着办吧。这里的问题是:

您给出的n个字符S [1 ... N],您认为是其所有的标点已经消失已损坏的文本文档的字符串(这样它看起来像itwasthebestoftimes ......)。希望使用字典,它可在一个布尔函数字典的形式来重建文件(*),使得对于任何字符串瓦特,字典(w)的具有值1,如果w是一个有效的字,并具有值0并非如此。

  1. 在给一个动态编程算法来决定是否字符串s [*]可改组为有效字序列。运行时间应该是最多为O(n ^ 2),假设每次调用词典需要单位时间。
  2. 在事件的字符串是有效的,让你的算法输出字的对应序列。
解决方案

让您压实文档的长度为N。

让B(n)是一个布尔值:如果此文件可以被分成从位置N文件中启动的话

B(N)为真(因为空字符串可分成0字)。 鉴于B(N),B(N - 1)... B(N - K)(1 N - - K)通过考虑开始字符n的所有单词 - 如果有1 - K,你可以构造b任何这样的话,瓦,B(N - K - 1 + LEN(W))集,然后设置B(N - K - 1)为true。如果没有这样的话,那么B组(N - K - 1)假

最后,你计算B(0),它会告诉你,如果整个文档可以分割成词。

在伪code:

 高清try_to_split(DOC):
  N = LEN(DOC)
  B = [假] *(N + 1)
  B〔N] = TRUE
  对于i在范围(N  -  1,-1,-1):
    字起始位置i:
      如果B〔1 + LEN(字)]:
        B〔I] = TRUE
        打破
  回复B
 

有一些技巧,你可以做的就是'字起始位置i'高效,但你要求为O(n ^ 2)算法,因此你可以查找每一个在字典中开始我的字符串。

要生成的话,你可以修改上面的算法来存储好话,或只是产生这样的:

 高清generate_words(DOC,B,IDX = 0):
  长度= 1
  而真正的:
    断言B(IDX)
    如果IDX == LEN(DOC):返回
    字= DOC [IDX:IDX +长]
    如果字在字典和B(IDX +长):
       输出(字)
       IDX + =长度
       长度= 1
 

下面b是该算法的第一部分产生的布尔数组

I need to find a dynamic programming algorithm to solve this problem. I tried but couldn't figure it out. Here is the problem:

You are given a string of n characters s[1...n], which you believe to be a corrupted text document in which all punctuation has vanished (so that it looks something like "itwasthebestoftimes..."). You wish to reconstruct the document using a dictionary, which is available in the form of a Boolean function dict(*) such that, for any string w, dict(w) has value 1 if w is a valid word, and has value 0 otherwise.

  1. Give a dynamic programming algorithm that determines whether the string s[*] can be reconstituted as a sequence of valid words. The running time should be at most O(n^2), assuming that each call to dict takes unit time.
  2. In the event that the string is valid, make your algorithm output the corresponding sequence of words.

解决方案

Let the length of your compacted document be N.

Let b(n) be a boolean: true if the document can be split into words starting from position n in the document.

b(N) is true (since the empty string can be split into 0 words). Given b(N), b(N - 1), ... b(N - k), you can construct b(N - k - 1) by considering all words that start at character N - k - 1. If there's any such word, w, with b(N - k - 1 + len(w)) set, then set b(N - k - 1) to true. If there's no such word, then set b(N - k - 1) to false.

Eventually, you compute b(0) which tells you if the entire document can be split into words.

In pseudo-code:

def try_to_split(doc):
  N = len(doc)
  b = [False] * (N + 1)
  b[N] = True
  for i in range(N - 1, -1, -1):
    for word starting at position i:
      if b[i + len(word)]:
        b[i] = True
        break
  return b

There's some tricks you can do to get 'word starting at position i' efficient, but you're asked for an O(N^2) algorithm, so you can just look up every string starting at i in the dictionary.

To generate the words, you can either modify the above algorithm to store the good words, or just generate it like this:

def generate_words(doc, b, idx=0):
  length = 1
  while true:
    assert b(idx)
    if idx == len(doc): return
    word = doc[idx: idx + length]
    if word in dictionary and b(idx + length):
       output(word)
       idx += length
       length = 1

Here b is the boolean array generated from the first part of the algorithm.

这篇关于拆分一个字符串的有效字使用动态规划字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆