如何分隔字符串中的段落 [英] How to separate paragraphs in a string

查看:137
本文介绍了如何分隔字符串中的段落的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我真的需要你的帮助。我试图拿一个多行字符串,并将其分成几个单独的文本。



我意识到,每当我跳过一行是那里的\\\\r的序列。之后,我以为每一行都以\\\
开头,并以\r结尾。因此,我写了以下代码。

  using System; 
使用System.Collections.Generic;
使用System.Linq;
使用System.Text;
使用System.Text.RegularExpressions;

命名空间ConsoleApplication15
{
类程序
{
struct ParagraphInfo
{
public ParagraphInfo(string text)
{
int i;
Text = text;
i = text.IndexOf('。');
FirstSentence = text.Substring(0,i);
}

public string Text,FirstSentence;
}

static void Main(string [] args)
{
int tmp = 0;
int tmp1 = 0;
string MultiParagraphString = @AA.aa。

BB.bb.

CC.cc.

DD.dd.

EE.ee.;

列表< ParagraphInfo> Paragraphs = new List&ParagraphInfo>();

Regex NewParagraphFinder = new Regex(@[\\\
] [\r]);
MatchCollection NewParagraphMatches = NewParagraphFinder.Matches(MultiParagraphString); (int i = 0; i< NewParagraphMatches.Count; i ++)
{
if(i == 0)
{
Paragraphs.Add(new ParagraphInfo((MultiParagraphString.Substring(0,NewParagraphMatches [0] .Index))));
}
else if(i ==(NewParagraphMatches.Count - 1))
{
tmp = NewParagraphMatches [i] .Index + 3;
tmp1 = MultiParagraphString.Length - NewParagraphMatches [i] .Index - 3;
Paragraphs.Add(new ParagraphInfo(MultiParagraphString.Substring(tmp,tmp1)));
}
else
{
tmp = NewParagraphMatches [i] .Index + 3;
tmp1 = NewParagraphMatches [i + 1] .Index - NewParagraphMatches [i] .Index + 3;
Paragraphs.Add(new ParagraphInfo(MultiParagraphString.Substring(tmp,tmp1)));
}
}

Console.WriteLine(MultiParagraphString);
foreach(ParagraphInfo Paragraph in段落)
{
Console.WriteLine(Paragraph.Text);

}


}
}
}

当我在整个文本中一个接一个地打印了段落的每个成员时,出现了一些相当奇怪的出现。段落列表的输出是这样的:



AA.aa。






CC.cc。



DD。






DD.dd。



EE。






ee。






我不明白为什么会发生这种情况,而且我不知道为什么输出如此不同的每一次。



对不起,如果这是一团糟,但我真的需要一些帮助。
BTW如果有人有更好的主意,可以随意分享..



谢谢

解决方案

您可以尝试以下操作:

  MultiParagraphString.Split(new [] {Environment .NewLine},
StringSplitOptions.RemoveEmptyEntries);

这将返回一个 IEnumerable< String> 。如果要将其转换为结构,只需使用选择

  MultiParagraphString.Split(new [] {Environment.NewLine},
StringSplitOptions.RemoveEmptyEntries)
.Select(s => new ParagraphInfo(s))ToList();


Hi guys I really need your help. I was trying to take a multi-line string which was concluded of a few paragraphs and split it into a few individual texts.

I realized that whenever I skip a line there is a sequence of \n\r in there. Afterwards I thought that each new line starts with a \n and end with a \r. Therefor, I wrote the following code.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication15
{
   class Program
   {
    struct ParagraphInfo
    {
        public ParagraphInfo(string text)
        {
            int i;
            Text = text;
            i = text.IndexOf('.');
            FirstSentence = text.Substring(0, i);
        }

        public string Text, FirstSentence;
    }

    static void Main(string[] args)
    {
        int tmp = 0;
        int tmp1 = 0;
        string MultiParagraphString = @"AA.aa.

BB.bb.

CC.cc.

DD.dd.

EE.ee.";

        List<ParagraphInfo> Paragraphs = new List<ParagraphInfo>();

        Regex NewParagraphFinder = new Regex(@"[\n][\r]");
        MatchCollection NewParagraphMatches = NewParagraphFinder.Matches(MultiParagraphString);


        for (int i = 0; i < NewParagraphMatches.Count; i++)
        {
            if (i == 0)
            {
                Paragraphs.Add(new ParagraphInfo((MultiParagraphString.Substring(0, NewParagraphMatches[0].Index))));
            }
            else if (i == (NewParagraphMatches.Count - 1))
            {
                tmp = NewParagraphMatches[i].Index + 3;
                tmp1 = MultiParagraphString.Length - NewParagraphMatches[i].Index - 3;
                Paragraphs.Add(new ParagraphInfo(MultiParagraphString.Substring(tmp, tmp1)));
            }
            else
            {
                tmp = NewParagraphMatches[i].Index + 3;
                tmp1 = NewParagraphMatches[i + 1].Index - NewParagraphMatches[i].Index+3;
                Paragraphs.Add(new ParagraphInfo(MultiParagraphString.Substring(tmp, tmp1)));
            }
        }

        Console.WriteLine(MultiParagraphString);
        foreach (ParagraphInfo Paragraph in Paragraphs)
        {
            Console.WriteLine(Paragraph.Text);

        }


    }
}
}

when I printed each member of Paragraphs one after another alongside the entire text something rather bizarre came appeared. The output of the Paragraph list was this:

AA.aa.


CC.cc.

DD.


DD.dd.

EE.


EE.ee.


I can not understand why does this keep happening, and moreover I can not figure out why is the output so different each time.

Sorry if it's a mess but I really need some help here. BTW if anyone has a better idea to do it feel free to share..

Thanks

解决方案

You may try the following:

MultiParagraphString.Split(new [] {Environment.NewLine}, 
           StringSplitOptions.RemoveEmptyEntries);

That will return a IEnumerable<String>. If you want to transform them to your structures just use Select:

MultiParagraphString.Split(new [] {Environment.NewLine}, 
           StringSplitOptions.RemoveEmptyEntries)
          .Select(s => new ParagraphInfo(s)).ToList();

这篇关于如何分隔字符串中的段落的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆