什么是解析出个别句子经常EX pression? [英] What is a regular expression for parsing out individual sentences?

查看:161
本文介绍了什么是解析出个别句子经常EX pression?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要寻找一个良好的.NET正前pression,我可以使用从文本正文解析出单独的句子。

I am looking for a good .NET regular expression that I can use for parsing out individual sentences from a body of text.

这应该能够文本以下块解析成整整6句话:

It should be able to parse the following block of text into exactly six sentences:

Hello world! How are you? I am fine.
This is a difficult sentence because I use I.D.

Newlines should also be accepted. Numbers should not cause  
sentence breaks, like 1.23.

这是证明比我本来以为有点更具挑战性。

This is proving a little more challenging than I originally thought.

任何帮助将大大AP preciated。我将使用这个系统培训上的文本已知机构。

Any help would be greatly appreciated. I am going to use this to train the system on known bodies of text.

推荐答案

试试这个 @(\\ S +?[!])(= \\ S +?| $)

string str=@"Hello world! How are you? I am fine. This is a difficult sentence because I use I.D.
Newlines should also be accepted. Numbers should not cause sentence breaks, like 1.23.";

Regex rx = new Regex(@"(\S.+?[.!?])(?=\s+|$)");
foreach (Match match in rx.Matches(str)) {
    int i = match.Index;
    Console.WriteLine(match.Value);
}

结果:

Hello world!
How are you?
I am fine.
This is a difficult sentence because I use I.D.
Newlines should also be accepted.
Numbers should not cause sentence breaks, like 1.23.

有关复杂的问题,当然,你需要像SharpNLP或NLTK一个真正的解析器。煤矿只是一个快速和肮脏的。

For complicated ones, of course, you will need a real parser like SharpNLP or NLTK. Mine is just a quick and dirty one.

下面是 SharpNLP 的信息和特点:

SharpNLP是自然的集合
  语言处理工具写成
  C#。目前,它提供了
  以下NLP工具:

SharpNLP is a collection of natural language processing tools written in C#. Currently it provides the following NLP tools:


  • 一个句子分配器

  • 一个标记

  • 部分的语音捉

  • 一个细节化(用来发现非递归语法标注,如名词短语块)

  • 解析器

  • 的名称取景器

  • 一个共指工具

  • 要WordNet的词汇数据库的接口

这篇关于什么是解析出个别句子经常EX pression?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆