在C#中使用正则表达式解析文本文件的多个部分 [英] Parsing multi sections of a text file using regex in C#

查看:60
本文介绍了在C#中使用正则表达式解析文本文件的多个部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想解析一个文本文件,其内容如下所示:

I would like to parse a text file with content that looks something like below:

START-OF-DATA
#100846105
START SECURITY|US912810DZ85|CBBT|
## in: 20150430_14:59:00 to 20150430_15:00:00 [13 (New York-DST)]
## out:20150430_14:59:00 to 20150430_15:00:00 [13 (New York-DST)]
04/30|15:00:00|B|118.640625||| |A|118.703125||| ||
04/30|14:59:54|B|118.6328125||| |A|118.6953125||| ||
04/30|14:59:52|B|118.6328125||| |A|118.6953125||| ||
04/30|14:59:23|B|118.6328125||| |A|118.6953125||| ||
04/30|14:59:20|B|118.6328125||| |A|118.6953125||| ||
END SECURITY|US912810DZ85|0|
#100846111
START SECURITY|US912810EA26|CBBT|
## in: 20150430_14:59:00 to 20150430_15:00:00 [13 (New York-DST)]
## out:20150430_14:59:00 to 20150430_15:00:00 [13 (New York-DST)]
04/30|15:00:00|B|124.75||| |A|124.828125||| ||
04/30|14:59:55|B|124.75||| |A|124.8203125||| ||
04/30|14:59:53|B|124.7421875||| |A|124.8203125||| ||
04/30|14:59:45|B|124.7421875||| |A|124.8125||| ||
04/30|14:59:43|B|124.7421875||| |A|124.828125||| ||
04/30|14:59:27|B|124.7421875||| |A|124.8125||| ||
04/30|14:59:24|B|124.7421875||| |A|124.828125||| ||
04/30|14:59:22|B|124.7421875||| |A|124.8125||| ||
04/30|14:59:20|B|124.7421875||| |A|124.828125||| ||
04/30|14:59:13|B|124.7421875||| |A|124.8125||| ||
END SECURITY|US912810EA26|0|
END-OF-DATA

使用下面的代码

string pattern = @"^(START-OF-DATA\r\n)(?<InstrumentsSection>[^\\]*?)(?:(^END-OF-DATA))";
var expressionMatchColl = regex.Matches(File.ReadAllText(filePath));
            foreach (Match match in expressionMatchColl)
{
                            string[] instrumentRows = match.Groups["InstrumentsSection"].Value.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
                            instruments = instrumentRows.ToList();
            }

我能够检索START-OF-DATA中的每一行,数据结束部分。但是,要忽略以START SECURITY,##
和END SECURITY开头的行。另外,想将刻度值和标识符(例如100846105、100846111)分为不同的组。

I'm able to retrieve each line within the START-OF-DATA and END-OF-DATA section. However, would like to ignore lines that begins with START SECURITY, ## and END SECURITY. Also, would like to group tick values and identifiers (e.g. 100846105, 100846111) in separate groups.

有人可以请教吗?

推荐答案

这里是一个简单的解析器

Here is a simple parser

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;


namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            List<Section> sections = new List<Section>();
            string input =
               "START-OF-DATA\n" +
               "#100846105\n" +
               "START SECURITY|US912810DZ85|CBBT|\n" +
               "## in: 20150430_14:59:00 to 20150430_15:00:00 [13 (New York-DST)]\n" +
               "## out:20150430_14:59:00 to 20150430_15:00:00 [13 (New York-DST)]\n" +
               "04/30|15:00:00|B|118.640625||| |A|118.703125||| ||\n" +
               "04/30|14:59:54|B|118.6328125||| |A|118.6953125||| ||\n" +
               "04/30|14:59:52|B|118.6328125||| |A|118.6953125||| ||\n" +
               "04/30|14:59:23|B|118.6328125||| |A|118.6953125||| ||\n" +
               "04/30|14:59:20|B|118.6328125||| |A|118.6953125||| ||\n" +
               "END SECURITY|US912810DZ85|0|\n" +
               "#100846111\n" +
               "START SECURITY|US912810EA26|CBBT|\n" +
               "## in: 20150430_14:59:00 to 20150430_15:00:00 [13 (New York-DST)]\n" +
               "## out:20150430_14:59:00 to 20150430_15:00:00 [13 (New York-DST)]\n" +
               "04/30|15:00:00|B|124.75||| |A|124.828125||| ||\n" +
               "04/30|14:59:55|B|124.75||| |A|124.8203125||| ||\n" +
               "04/30|14:59:53|B|124.7421875||| |A|124.8203125||| ||\n" +
               "04/30|14:59:45|B|124.7421875||| |A|124.8125||| ||\n" +
               "04/30|14:59:43|B|124.7421875||| |A|124.828125||| ||\n" +
               "04/30|14:59:27|B|124.7421875||| |A|124.8125||| ||\n" +
               "04/30|14:59:24|B|124.7421875||| |A|124.828125||| ||\n" +
               "04/30|14:59:22|B|124.7421875||| |A|124.8125||| ||\n" +
               "04/30|14:59:20|B|124.7421875||| |A|124.828125||| ||\n" +
               "04/30|14:59:13|B|124.7421875||| |A|124.8125||| ||\n" +
               "END SECURITY|US912810EA26|0|\n" +
               "END-OF-DATA\n";


            StringReader reader = new StringReader(input);
            string inputLine = "";
            Section newSection = null;
            while ((inputLine = reader.ReadLine()) != null)
            {
                inputLine = inputLine.Trim();
                if (inputLine.StartsWith("#"))
                {
                    if (inputLine.Contains("in:")) continue;
                    if (inputLine.Contains("out:")) continue;
                    newSection = new Section();
                    sections.Add(newSection);
                    newSection.iD = inputLine.Substring(1);
                    newSection.data = new List<string>();

                }
                else
                {
                    if (inputLine.Substring(0, 3) == "END") continue;
                    if (inputLine.Substring(0, 5) == "START") continue;
                    newSection.data.Add(inputLine);
                }
            }

        }
        public class Section
        {
            public string iD { get; set; }
            public List<string> data { get; set; }
        }
    }

}

这篇关于在C#中使用正则表达式解析文本文件的多个部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆