如何解析/拆分英语句子C# [英] How to parse / split English Sentence C#

查看:115
本文介绍了如何解析/拆分英语句子C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请,PROGRAMMER帮助我...我很困惑如何创建解析英语句子的代码。

如何判断一个句子何时结束,另一个句子在嵌入标点符号时开始......

例如:



输入:



第一句。第二句话!第三句话?是的。

.....很高兴认识你......我很好




输出:

arr [0] =第一句话

arr [1] =第二句

arr [3] =第三句
arr [4] =是

arr [5] =很高兴见到你

arr [6] =我很好




说明:

我想拆分{?,! ,多个点或字符串没有意义}



我的代码



Please, PROGRAMMER help me... Im so so confused how to create code for parsing English sentence.
how to recognize when one sentence ends and another sentence begins when it has embedded punctuation marks..
For example :

Input :

First sentence. Second sentence! Third sentence? .Yes.
.....Nice to meet you.... I am okay


Output :
arr[0] = First sentence
arr[1] = Second sentence
arr[3] = Third sentence
arr[4] = Yes
arr[5] = Nice to meet you
arr[6] = I am okay


Explanation :
I wanna split { ?, ! , multiple dot or string doesn''t have meaning }

This my code

string[] word = new string[100];
string inputRtb = rtbInput.Text;

string plot1 = "";
string plot2 = "";

string[] splitString = inputRtb.Split(new char[] {' ', '\t', '\n'});

int j = 0;
int pos = 0;
for (int i = 0 ; i < splitString.Length; i++)
{
    if (splitString[i].Trim() != "" && splitString[i].Trim() != ".")
    {
        if (splitString[i].Trim()[splitString[i].Length - 1] == '.')
        {
            plot2 = substr(splitString[i].Trim(), 0, splitString[i].Length - 1);
            if (plot1 == "")
                plot1 += plot2;
            else
                plot1 += " " + plot2;
            pos++;
        }

        else if (plot1 == "")
            plot1 += splitString[i].Trim();
        else
            plot1 += " " + splitString[i].Trim();
    }


    if (plot1 != "" && splitString[i].Trim() == ".")
    {
        word[j++] = plot1;
        plot1 = "";
    }
    else if (pos > 0)
    {
        word[j++] = plot1;
        plot1 = "";
        pos--;
    }
    else if (plot1 != "" && i == splitString.Length-1)
    {
        word[j++] = plot1;
        plot1 = "";
    }
}

推荐答案

这个任务并不像听起来那么简单:标点符号字符本质上取决于上下文。

例如这一行开头的点不是每个都有一个句子。 ;-)

或(见上文第1项)在点之后不会终止。

还有更多案例还有其他标点字符。



但看起来这不是问题的主题。

所以,如果你想简单地在一些分隔符之间获取文本块,将重复的分隔符作为一个分隔符处理,从找到的文本块中剥离引导和训练空间,然后以下内容可以:

This task is not so trivial as it sounds: a punctuation character is intrinsically context dependent.
E.g. the dots in at the beginning of this line do not make a sentence each. ;-)
Or "(see item 1. above)" does not terminate after the dot.
There is many more cases also with other punctuation characters.

But it looks like this is not the topic of the question.
So, if you want to simply get the chunks of text between some delimiters, treating repetitions of delimiters as one delimiter, stripping off leading and trainling spaces from the found chunks of text, then the following would do:
string fullText = "..."; // input text
char[] delim = ".?!;".ToCharArray(); // add more single character delimiter as needed
var sentences = fullText.Split(delim, StringSplitOptions.RemoveEmptyEntries).Select(s=>s.Trim());
foreach(var s in sentences) Console.WriteLine(s);



干杯

Andi


Cheers
Andi


这是一个粗略的算法,但它应该完成工作。



Here is a rough algorithm, but it should get the job done.

public static string[] ParseSentences(string sentence)
{
    char[] terminators = { ''.'', ''?'', ''!'' };

    List<string> sentences = new List<string>(
        sentence.Split(terminators, StringSplitOptions.RemoveEmptyEntries));
    for (int i = sentences.Count - 1; i >= 0; i--)
        if (sentences[i].Trim().Length == 0)
            sentences.RemoveAt(i);
        else
            sentences[i] = sentences[i].Trim();

    return sentences.ToArray();
}


这一行可以解决这个问题,只需要将所有行终止符添加到数组中。

This one line will do the trick, you just need to add all the line terminators to the array.
string[] splitString = inputRtb.Split(new char[] { '!', '?', '.', '\t', '\n' }, StringSplitOptions.RemoveEmptyEntries);

这篇关于如何解析/拆分英语句子C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆