用字典解析字符串的算法 [英] algorithm to parse string with dictionary

查看:170
本文介绍了用字典解析字符串的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定




  • 一个充满词汇的字典 {in,july,den,dentalist,best ,...} 与一些C ++ API访问它: boolean findWord(string word) string getNextWord void)来迭代它,


  • 一些没有空格的输入字符串,例如: bestdentistinjuly ...




输出




  • 7月份最好的牙医是... (基本上按照给定字典的空格分隔非空格字符串)



解决它最好的算法是什么?



一个微妙但重要的问题是,是否有任何花哨的方式来解决不可逾越的死胡同的问题。例如, den dentist 都是有效的词来剖析字符串的其余部分,其中一个可能只是一个对于我来说,这似乎是一个贪心的问题,或者动态编程可解决的东西。

解决方案

您可以创建一种单词树:



您可以通过没有空格的字符串。一旦你在你的列表中找到一个单词,你可以添加一个空格,你继续,直到你不能进一步。



然后你回到前一个单词,并尝试添加新的信件,你可以创建一个单词,如果你可以继续他们的。



你尝试这样做,直到你尝试所有的可能性。



如果你回到起始单词,你没有找到任何解决方案=>否sol



这是算法(我的伪代码语法不好,但你可以得到一般的想法,我相信你必须纠正它有一点):

  TreeWordResult //将结果保存在内存中的树

通过InputString:

如果在InputDictionnary
中找到一个单词,则将此单词添加到treeWordResult
的最后一个节点否则:
while(否找到的单词)
返回treeWordResult
尝试在InputDictionnary中找到与之前不同的单词(在另一个节点上)
endwhile
如果没有找到单词:
return NO SOLUTION
否则:
继续通过单词
endif
endif
return Leaf

当您找不到任何解决方案时,或者当您在叶(你穿过整个字符串)时,算法结束



以下是使用示例的说明:





希望我的解释很清楚。


Given

  • a dictionary full of words {in, july, den, dentist, best, ...} with some C++ API to access it: boolean findWord(string word), or string getNextWord(void) to iterate through it,

  • some input string with no space, e.g.: bestdentistinjuly...

Output

  • best dentist in july is... (basically separate the non-space string by space given a dictionary)

What will be the best algorithm to solve it?

A subtle but important question is, is there any fancy way to solve the unreachable dead-end problem. E.g., den and dentist are both valid words to dissect the rest of the string, one of them may just be a dead-end.

To me it seems like a greedy problem or something solvable by dynamic programming..

解决方案

You can create a kind of word tree:

You can go throught the string with no space. Once you find a word in your list, you add a space and you continu... until you cannot go further.

Then you go back to the previous word and try to se if adding new letter you can create a word, and if you can continu from their.

You try this until you tried all the possiblities.

If you go back to the starting word and you don't find any solution => no sol

Here is the algorithm ( my pseudocode syntax is not good, but you can get the general idea. I believe you will have to correct it a little):

TreeWordResult //Tree that keeps the results in memory

Go through the InputString:

      If you find a word in the InputDictionnary 
          Then add this word to the last node of the treeWordResult
      Otherwise:
          while (No word found):
                go back in the treeWordResult
                try to find word in InputDictionnary different from the one before (on the other node)
          endwhile
          if no word found:
                     return NO SOLUTION
          otherwise:
                     continue going through word
          endif
       endif
 return Leaf   

Algorithm ends when you find no sol, or when your at a "leaf" (you went thhrough the whole string)

Here is an illustration using your example:

Hope my explaination is clear.

这篇关于用字典解析字符串的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆