什么是解析出个别句子经常EX pression? [英] What is a regular expression for parsing out individual sentences?
问题描述
我要寻找一个良好的.NET正前pression,我可以使用从文本正文解析出单独的句子。
I am looking for a good .NET regular expression that I can use for parsing out individual sentences from a body of text.
这应该能够文本以下块解析成整整6句话:
It should be able to parse the following block of text into exactly six sentences:
Hello world! How are you? I am fine.
This is a difficult sentence because I use I.D.
Newlines should also be accepted. Numbers should not cause
sentence breaks, like 1.23.
这是证明比我本来以为有点更具挑战性。
This is proving a little more challenging than I originally thought.
任何帮助将大大AP preciated。我将使用这个系统培训上的文本已知机构。
Any help would be greatly appreciated. I am going to use this to train the system on known bodies of text.
推荐答案
试试这个 @(\\ S +?[!])(= \\ S +?| $)
:
string str=@"Hello world! How are you? I am fine. This is a difficult sentence because I use I.D.
Newlines should also be accepted. Numbers should not cause sentence breaks, like 1.23.";
Regex rx = new Regex(@"(\S.+?[.!?])(?=\s+|$)");
foreach (Match match in rx.Matches(str)) {
int i = match.Index;
Console.WriteLine(match.Value);
}
结果:
Hello world!
How are you?
I am fine.
This is a difficult sentence because I use I.D.
Newlines should also be accepted.
Numbers should not cause sentence breaks, like 1.23.
有关复杂的问题,当然,你需要像SharpNLP或NLTK一个真正的解析器。煤矿只是一个快速和肮脏的。
For complicated ones, of course, you will need a real parser like SharpNLP or NLTK. Mine is just a quick and dirty one.
下面是 SharpNLP 的信息和特点:
SharpNLP是自然的集合
语言处理工具写成
C#。目前,它提供了
以下NLP工具:
SharpNLP is a collection of natural language processing tools written in C#. Currently it provides the following NLP tools:
- 一个句子分配器
- 一个标记
- 部分的语音捉
- 一个细节化(用来发现非递归语法标注,如名词短语块)
- 解析器
- 的名称取景器
- 一个共指工具
- 要WordNet的词汇数据库的接口
这篇关于什么是解析出个别句子经常EX pression?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!