如何识别句子的结尾 [英] how to identify a end of a sentence

查看:98
本文介绍了如何识别句子的结尾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

String x=" i am going to the party at 6.00 in the evening. are you coming with me?";

如果我具有上述字符串,则需要使用句子边界标点(例如.和?)将其分解为句子

if i have the above string, i need that to be broken to sentences by using sentence boundry punctuations(like . and ?)

但是它不应该在6处拆分句子,因为那里有一个指针.有没有一种方法可以识别Java中正确的句子边界位置?我曾尝试在java.util pakage中使用stringTokenizer,但只要找到指针,它总是会中断句子.有人可以建议我一种正确执行此操作的方法吗?

but it should not split the sentence at 6 because of having an pointer there. is there a way to identify what is the correct sentence boundry place in java? i have tried using stringTokenizer in java.util pakage but it always break the sentence whenever it finds a pointer. Can someone suggest me a method to do this correctly?

这是我尝试将文本标记为句子的方法.

This is the method which i have tried in tokenizing a text into sentences.

public static ArrayList<String> sentence_segmenter(String text) {
    ArrayList<String> Sentences = new ArrayList<String>();

    StringTokenizer st = new StringTokenizer(text, ".?!");
    while (st.hasMoreTokens()) {

        Sentences.add(st.nextToken());
    }
    return Sentences;
}

我也有一种将句子分段成短语的方法,但是当程序发现comma(,)时,它也会分割文本.但是当中间有逗号时,例如60,000,我不需要拆分它.以下是我用来分割词组的方法.

also i have a method to segement sentences into phrases, but here also when the program found comma(,) it splits the text. but i dont need to split it when there is a number like 60,000 with a comma in the middle. following is the method i am using to segment the phrases.

   public static ArrayList<String> phrasesSegmenter(String text) {
    ArrayList<String> phrases = new ArrayList<String>();
    StringTokenizer st = new StringTokenizer(text, ",");
    while (st.hasMoreTokens()) {
        phrases.add(st.nextToken());
    }
    return phrases;
}

推荐答案

来自 StringTokenizer 的文档:

From the documentation of StringTokenizer:

StringTokenizer是一个遗留类,出于兼容性原因而保留,尽管在新代码中不鼓励使用它.建议任何寻求此功能的人改用String的split方法或java.util.regex包.

StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.

如果要使用split,则可以使用任何正则表达式将文本拆分为句子.您可能想要类似?!.的任何内容以及文本的空格或结尾:

In case you'd use split, you can use any regular expression to split the text into sentences. You probably want something like any of ?!. and either a space or end of text:

text.split("[?!.]($|\\s)")

这篇关于如何识别句子的结尾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆