用字典解析字符串的算法 [英] algorithm to parse string with dictionary
问题描述
给定
-
一个充满词汇的字典
{in,july,den,dentalist,best ,...}
与一些C ++ API访问它:boolean findWord(string word)
或string getNextWord void)
来迭代它, -
一些没有空格的输入字符串,例如:
bestdentistinjuly
...
输出
-
7月份最好的牙医是...
(基本上按照给定字典的空格分隔非空格字符串)
解决它最好的算法是什么?
一个微妙但重要的问题是,是否有任何花哨的方式来解决不可逾越的死胡同的问题。例如, den
和 dentist
都是有效的词来剖析字符串的其余部分,其中一个可能只是一个对于我来说,这似乎是一个贪心的问题,或者动态编程可解决的东西。
您可以创建一种单词树:
您可以通过没有空格的字符串。一旦你在你的列表中找到一个单词,你可以添加一个空格,你继续,直到你不能进一步。
然后你回到前一个单词,并尝试添加新的信件,你可以创建一个单词,如果你可以继续他们的。
你尝试这样做,直到你尝试所有的可能性。
如果你回到起始单词,你没有找到任何解决方案=>否sol
这是算法(我的伪代码语法不好,但你可以得到一般的想法,我相信你必须纠正它有一点):
TreeWordResult //将结果保存在内存中的树
通过InputString:
如果在InputDictionnary
中找到一个单词,则将此单词添加到treeWordResult
的最后一个节点否则:
while(否找到的单词)
返回treeWordResult
尝试在InputDictionnary中找到与之前不同的单词(在另一个节点上)
endwhile
如果没有找到单词:
return NO SOLUTION
否则:
继续通过单词
endif
endif
return Leaf
当您找不到任何解决方案时,或者当您在叶(你穿过整个字符串)时,算法结束
以下是使用示例的说明:
希望我的解释很清楚。
Given
a dictionary full of words
{in, july, den, dentist, best, ...}
with some C++ API to access it:boolean findWord(string word)
, orstring getNextWord(void)
to iterate through it,some input string with no space, e.g.:
bestdentistinjuly
...
Output
best dentist in july is...
(basically separate the non-space string by space given a dictionary)
What will be the best algorithm to solve it?
A subtle but important question is, is there any fancy way to solve the unreachable dead-end problem. E.g., den
and dentist
are both valid words to dissect the rest of the string, one of them may just be a dead-end.
To me it seems like a greedy problem or something solvable by dynamic programming..
You can create a kind of word tree:
You can go throught the string with no space. Once you find a word in your list, you add a space and you continu... until you cannot go further.
Then you go back to the previous word and try to se if adding new letter you can create a word, and if you can continu from their.
You try this until you tried all the possiblities.
If you go back to the starting word and you don't find any solution => no sol
Here is the algorithm ( my pseudocode syntax is not good, but you can get the general idea. I believe you will have to correct it a little):
TreeWordResult //Tree that keeps the results in memory
Go through the InputString:
If you find a word in the InputDictionnary
Then add this word to the last node of the treeWordResult
Otherwise:
while (No word found):
go back in the treeWordResult
try to find word in InputDictionnary different from the one before (on the other node)
endwhile
if no word found:
return NO SOLUTION
otherwise:
continue going through word
endif
endif
return Leaf
Algorithm ends when you find no sol, or when your at a "leaf" (you went thhrough the whole string)
Here is an illustration using your example:
Hope my explaination is clear.
这篇关于用字典解析字符串的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!