如何分隔字符串中的段落? [英] How to separate paragraphs in a string?
问题描述
我试图采用多行字符串,该字符串由几个段落组成,然后将其拆分为几个单独的文本.
I was trying to take a multi-line string which was concluded of a few paragraphs and split it into a few individual texts.
我意识到,每当我跳过一行时,都会有一个\ n \ r序列.之后,我认为每行都以\ n开头并以\ r结尾.为此,我编写了以下代码.
I realized that whenever I skip a line there is a sequence of \n\r in there. Afterwards I thought that each new line starts with a \n and end with a \r. Therefor, I wrote the following code.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication15
{
class Program
{
struct ParagraphInfo
{
public ParagraphInfo(string text)
{
int i;
Text = text;
i = text.IndexOf('.');
FirstSentence = text.Substring(0, i);
}
public string Text, FirstSentence;
}
static void Main(string[] args)
{
int tmp = 0;
int tmp1 = 0;
string MultiParagraphString = @"AA.aa.
BB.bb.
CC.cc.
DD.dd.
EE.ee.";
List<ParagraphInfo> Paragraphs = new List<ParagraphInfo>();
Regex NewParagraphFinder = new Regex(@"[\n][\r]");
MatchCollection NewParagraphMatches = NewParagraphFinder.Matches(MultiParagraphString);
for (int i = 0; i < NewParagraphMatches.Count; i++)
{
if (i == 0)
{
Paragraphs.Add(new ParagraphInfo((MultiParagraphString.Substring(0, NewParagraphMatches[0].Index))));
}
else if (i == (NewParagraphMatches.Count - 1))
{
tmp = NewParagraphMatches[i].Index + 3;
tmp1 = MultiParagraphString.Length - NewParagraphMatches[i].Index - 3;
Paragraphs.Add(new ParagraphInfo(MultiParagraphString.Substring(tmp, tmp1)));
}
else
{
tmp = NewParagraphMatches[i].Index + 3;
tmp1 = NewParagraphMatches[i + 1].Index - NewParagraphMatches[i].Index+3;
Paragraphs.Add(new ParagraphInfo(MultiParagraphString.Substring(tmp, tmp1)));
}
}
Console.WriteLine(MultiParagraphString);
foreach (ParagraphInfo Paragraph in Paragraphs)
{
Console.WriteLine(Paragraph.Text);
}
}
}
}
当我逐段打印每个段落成员时,出现了相当奇怪的东西.段落列表的输出是这样的:
when I printed each member of Paragraphs one after another alongside the entire text something rather bizarre came appeared. The output of the Paragraph list was this:
AA.aa.
CC.cc.
DD.
DD.dd.
EE.
EE.ee.
我不明白为什么会一直这样,而且我也不知道每次输出为何如此不同.
I can not understand why does this keep happening, and moreover I can not figure out why is the output so different each time.
对不起,如果真是一团糟,但我确实需要一些帮助.如果有人有更好的主意,那就随便分享吧.
Sorry if it's a mess but I really need some help here. If anyone has a better idea to do it feel free to share.
推荐答案
您可以尝试以下操作:
MultiParagraphString.Split(new [] {Environment.NewLine},
StringSplitOptions.RemoveEmptyEntries);
这将返回 IEnumerable< String>
.如果要将它们转换为您的结构,只需使用 Select
:
That will return a IEnumerable<String>
. If you want to transform them to your structures just use Select
:
MultiParagraphString.Split(new [] {Environment.NewLine},
StringSplitOptions.RemoveEmptyEntries)
.Select(s => new ParagraphInfo(s)).ToList();
这篇关于如何分隔字符串中的段落?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!