正则表达式匹配句子 [英] Regular expression match a sentence

查看:218
本文介绍了正则表达式匹配句子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何匹配Hello world或Hello World形式的句子。句子可能包含 - / digit 0-9。任何信息对我都非常有帮助。谢谢。

How can I match a sentence of the form "Hello world" or "Hello World". The sentence may contain "- / digit 0-9". Any information will be very helpful to me. Thank you.

推荐答案

这个会做得很好。我对句子的定义:句子以非空格开头,以句号,感叹号或问号(或字符串结尾)结束。在结束标点符号后可能会有结束语。

This one will do a pretty good job. My definition of a sentence: A sentence begins with a non-whitespace and ends with a period, exclamation point or a question mark (or end of string). There may be a closing quote following the ending punctuation.

[^。!?\ ts] [^。!?] *(? [!?](?![ '] \s | $)[^。!?] *)* [。!?] ['](= \s | $)<? / code>

[^.!?\s][^.!?]*(?:[.!?](?!['"]?\s|$)[^.!?]*)*[.!?]?['"]?(?=\s|$)

import java.util.regex.*;
public class TEST {
    public static void main(String[] args) {
        String subjectString = 
        "This is a sentence. " +
        "So is \"this\"! And is \"this?\" " +
        "This is 'stackoverflow.com!' " +
        "Hello World";
        String[] sentences = null;
        Pattern re = Pattern.compile(
            "# Match a sentence ending in punctuation or EOS.\n" +
            "[^.!?\\s]    # First char is non-punct, non-ws\n" +
            "[^.!?]*      # Greedily consume up to punctuation.\n" +
            "(?:          # Group for unrolling the loop.\n" +
            "  [.!?]      # (special) inner punctuation ok if\n" +
            "  (?!['\"]?\\s|$)  # not followed by ws or EOS.\n" +
            "  [^.!?]*    # Greedily consume up to punctuation.\n" +
            ")*           # Zero or more (special normal*)\n" +
            "[.!?]?       # Optional ending punctuation.\n" +
            "['\"]?       # Optional closing quote.\n" +
            "(?=\\s|$)", 
            Pattern.MULTILINE | Pattern.COMMENTS);
        Matcher reMatcher = re.matcher(subjectString);
        while (reMatcher.find()) {
            System.out.println(reMatcher.group());
        } 
    }
}

这是输出:

这是一个句子。

所以这个 !

并且是这个?

这是' stackoverflow.com!'

Hello World

匹配所有这些都正确(最后一句话没有结束标点符号),结果似乎并不那么容易!

Matching all of these correctly (with the last sentence having no ending punctuation), turns out to be not so easy as it seems!

这篇关于正则表达式匹配句子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆