英文字符串分割 [英] English string segmentation
本文介绍了英文字符串分割的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
最近〜我在项目中遇到了有关英语字符串分割的问题.
首先,有一个子字符串数据库,以及如何通过匹配子字符串数据库来最大程度地分割输入字符串.
即:
inputstring = " ;
substringdatebase [] = {" ," 男人", wome" ,
" ," c ++", shi",
" ," 程序员", and",
" ," 代码", 项目"
.....
};
如下所示:
outstring = {" , mem", shi" .....};
解决方案
为了提高搜索效率,您可以使用trie数据结构来表示子字符串( ^ ]).
然后重复执行搜索与某些子字符串匹配的最长字符串.无论如何,这不是防弹解决方案,因为长子串匹配可能会取代短子串匹配并导致死胡同.
示例:在"abce"中查找{"ab","abc","ce"}将检测到"abc",然后在"e"上失败,而有一个解决方案,其中"ab"后跟"ce". /blockquote>
recently~ i met a question about the english string segmentation in my project.
at first,there is a substring database ,and how to segmentation a input string in at all most through match the substring database.
ie:
inputstring="womenshic++programmerandorcodeproject"; substringdatebase[]={"wo","men","wome", "c","c++","shi", "program","programmer","and", "or","code","project" .......... };
as follow affter segmentation:
outstring={"wo","mem","shi".....};解决方案For efficiency of the search, you can represent your substrings using a trie data structure (http://en.wikipedia.org/wiki/Trie[^]).
Then repeatedly perform a search for the longest string that matches some substring. Anyway, this is not a bulletproof solution, as long substring matches could supersede shorter substring matches and lead to a dead end.
Example: looking for { "ab", "abc", "ce" } in "abce" would detect "abc" and then fail on "e", whereas there is a solution with "ab" followed by "ce".
这篇关于英文字符串分割的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文