英文字符串分割 [英] English string segmentation

查看：88 发布时间：2019/6/20 0:52:35 Algorithms

本文介绍了英文字符串分割的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

最近〜我在项目中遇到了有关英语字符串分割的问题.
首先，有一个子字符串数据库，以及如何通过匹配子字符串数据库来最大程度地分割输入字符串.

即:

 inputstring = " ;
substringdatebase [] = {" ，" 男人"，  wome" ，
                     " ，"  c ++"，  shi"，
                    " ，" 程序员"，  and"，
                    " ，" 代码"， 项目"
                     .....
                    };

如下所示:

 outstring = {" ，  mem"，  shi" .....};

解决方案

为了提高搜索效率，您可以使用trie数据结构来表示子字符串( ^ ]).

然后重复执行搜索与某些子字符串匹配的最长字符串.无论如何，这不是防弹解决方案，因为长子串匹配可能会取代短子串匹配并导致死胡同.

示例:在"abce"中查找{"ab"，"abc"，"ce"}将检测到"abc"，然后在"e"上失败，而有一个解决方案，其中"ab"后跟"ce". /blockquote>

recently~ i met a question about the english string segmentation in my project.
at first,there is a substring database ,and how to segmentation a input string in at all most through match the substring database.

ie:
inputstring="womenshic++programmerandorcodeproject";
substringdatebase[]={"wo","men","wome",
                     "c","c++","shi",
                    "program","programmer","and",
                    "or","code","project"
                     ..........
                    };
as follow affter segmentation:
outstring={"wo","mem","shi".....};
解决方案
For efficiency of the search, you can represent your substrings using a trie data structure (http://en.wikipedia.org/wiki/Trie[^]).

Then repeatedly perform a search for the longest string that matches some substring. Anyway, this is not a bulletproof solution, as long substring matches could supersede shorter substring matches and lead to a dead end.

Example: looking for { "ab", "abc", "ce" } in "abce" would detect "abc" and then fail on "e", whereas there is a solution with "ab" followed by "ce".

这篇关于英文字符串分割的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

英文字符串分割 [英] English string segmentation

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

英文字符串分割 [英] English string segmentation

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭