如何将段落分成句子? [英] How to split paragraphs into sentences?

查看:131
本文介绍了如何将段落分成句子?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请查看以下内容。

String[]sentenceHolder = titleAndBodyContainer.split("\n|\\.(?!\\d)|(?<!\\d)\\.");

这就是我试图将一个段落分成句子的方法。但有个问题。我的段落包括 Jan等日期。 13,2014 ,像美国这样的字样和 2.2 之类的数字。他们都被上面的代码分开了。所以基本上,这个代码分裂了许多点,无论它是否完全停止。

This is how I tried to split a paragraph into sentences. But, there is a problem. My paragraph includes dates like Jan. 13, 2014, words like U.S and numbers like 2.2. They all got splitted by the above code. So basically, this code splits lot of 'dots' whether it is a full stop or not.

我试过 String [] sentenceHolder = titleAndBodyContainer.split(。\ n); String [] sentenceHolder = titleAndBodyContainer.split(\\。); 。全部都失败了。

I tried String[]sentenceHolder = titleAndBodyContainer.split(".\n"); and String[]sentenceHolder = titleAndBodyContainer.split("\\."); as well. All failed.

如何正确地将一个段落分成句子?

How can I split a paragraph into sentences "properly"?

推荐答案

你可以试试这个

String str = "This is how I tried to split a paragraph into a sentence. But, there is a problem. My paragraph includes dates like Jan.13, 2014 , words like U.S and numbers like 2.2. They all got split by the above code.";

Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$)[^.!?]*)*[.!?]?['\"]?(?=\\s|$)", Pattern.MULTILINE | Pattern.COMMENTS);
Matcher reMatcher = re.matcher(str);
while (reMatcher.find()) {
    System.out.println(reMatcher.group());
}

输出:

This is how I tried to split a paragraph into a sentence.
But, there is a problem.
My paragraph includes dates like Jan.13, 2014 , words like U.S and numbers like 2.2.
They all got split by the above code.

这篇关于如何将段落分成句子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆