如何将段落分成句子? [英] How to split paragraphs into sentences?
问题描述
请查看以下内容。
String[]sentenceHolder = titleAndBodyContainer.split("\n|\\.(?!\\d)|(?<!\\d)\\.");
这就是我试图将一个段落分成句子的方法。但有个问题。我的段落包括 Jan等日期。 13,2014
,像美国
这样的字样和 2.2
之类的数字。他们都被上面的代码分开了。所以基本上,这个代码分裂了许多点,无论它是否完全停止。
This is how I tried to split a paragraph into sentences. But, there is a problem. My paragraph includes dates like Jan. 13, 2014
, words like U.S
and numbers like 2.2
. They all got splitted by the above code. So basically, this code splits lot of 'dots' whether it is a full stop or not.
我试过 String [] sentenceHolder = titleAndBodyContainer.split(。\ n);
和 String [] sentenceHolder = titleAndBodyContainer.split(\\。);
。全部都失败了。
I tried String[]sentenceHolder = titleAndBodyContainer.split(".\n");
and String[]sentenceHolder = titleAndBodyContainer.split("\\.");
as well. All failed.
如何正确地将一个段落分成句子?
How can I split a paragraph into sentences "properly"?
推荐答案
你可以试试这个
String str = "This is how I tried to split a paragraph into a sentence. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.S and numbers like 2.2. They all got split by the above code.";
Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?(?=\\s|$)", Pattern.MULTILINE | Pattern.COMMENTS);
Matcher reMatcher = re.matcher(str);
while (reMatcher.find()) {
System.out.println(reMatcher.group());
}
输出:
This is how I tried to split a paragraph into a sentence.
But, there is a problem.
My paragraph includes dates like Jan.13, 2014 , words like U.S and numbers like 2.2.
They all got split by the above code.
这篇关于如何将段落分成句子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!