将文本拆分为句子和句子到词语：BreakIterator与正则表达式 [英] Splitting text to sentences and sentence to words: BreakIterator vs regular expressions

查看：496 发布时间：2016/12/21 23:35:16 java regex string comparison tokenize

本文介绍了将文本拆分为句子和句子到词语：BreakIterator与正则表达式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我不小心回答了问题，其中原始问题涉及将句子分割为单独的单词。

I accidentally answered a question where the original problem involved splitting sentence to separate words.

作者建议使用 BreakIterator 来标记输入字符串，有些人喜欢这个想法。

And the author suggested to use BreakIterator to tokenize input strings and some people liked this idea.

我只是没有得到疯狂：如何25行复杂的代码可以比一个简单的单线程regexp更好？

I just don't get that madness: how 25 lines of complicated code can be better than a simple one-liner with regexp?

请告诉我使用BreakIterator的好处，以及应该使用BreakIterator的真实情况。

Please, explain me the pros of using BreakIterator and the real cases when it should be used.

如果它真的那么酷和适当，那么我不知道：你真的使用 BreakIterator ？

If it's really so cool and proper then I wonder: do you really use the approach with BreakIterator in your projects?