.NET正则表达式引擎返回任何比赛,但我期待8 [英] .NET regex engine returns no matches but I am expecting 8

查看:175
本文介绍了.NET正则表达式引擎返回任何比赛,但我期待8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图写一个正则表达式来从一个SQL脚本,每次插入一行。当我使用.NET正则表达式测试仪上的正则表达式英雄我得到我预期中的8比赛的。然而,当我运行此的这段作为控制台应用程序,它不返回任何结果。

 常量字符串文本=
@INSERT INTO [管理员preFS]([SpayClinic],[VaxClinic],[ShelterClinic],[DateModified],[preFIX],[UpdateCounter],[LockedRecs],[数据库],[定时] [MedCtrClinic],[OtherClinic],[Da2PPPx],[Da2PPEPx],[FVRCPPx],[FVRCPEPx],[FELVTPx],[FELVTEPx],[FELVVPx],[FELVVEPx],[HWTPx],[HWTEPx],[ RabiesPx],[RabiesEPx],[FIVTest],[FIVTestE],[OnePlusChar],[XSHWMPx],[XSHWMEPx],[SHWMPx],[SHWMEPx],[MHWMPx],[MHWMEPx],[LHWMPx],[LHWMEPx] [DebuggerOn],[PayThisAmount],[free6],[XSHWMPillPx],[XSHWMPillEPx],[SHWMPillPx],[SHWMPillEPx],[MHWMPillPx],[MHWMPillEPx],[LHWMPillPx],[LHWMPillEPx],[free7],[ free8],[free9],[XSPMPx],[XSPMEPx],[SPMPx],[SPMEPx],[MPMPx],[MPMEPx],[LPMPx],[LPMEPx],[ReceiptFooter],[MonthsUntilBenefits],[free12] [XSPMPillPx],[XSPMPillEPx],[SPMPillPx],[SPMPillEPx],[MPMPillPx],[MPMPillEPx],[LPMPillPx],[LPMPillEPx],[free14],[ClinicName],[ShelterName],[ShelterAbbr],[地址1],[地址2],[市],[状态],[邮编code],[MainPhone],[MainFax],[SplashPict],[free17],[free18],[LicenseNo],[系列号] [free20],[free21],[free22],[VLogCC],[SNLogCC],[free23],[free24],[free25],[AgeAndBDay],[free26],[free27],[free28],[ CurrRouteNum])
VALUES
(12,7,0,'0000/00/00 00:00:00:00','',0,'','',0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0 ,0,0,'','','','','','','','','','',X'5443503408',0,0,'',0,0 ,0,0,'','',0,0,0,0,0,0,0,0),
(15,53,0,'0000/00/00 00:00:00:00','',0,'','',0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0 ,0,0,'','','','','','','','','','',X'5443503408',0,0,'',0,0 ,0,0,'','',0,0,0,0,0,0,0,0),
(20,216,0,'0000/00/00 00:00:00:00','',0,'','',0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0 ,0,0,'','','','','','','','','','',X'5443503408',0,0,'',0,0 ,0,0,'','',0,0,0,0,0,0,0,0),
(16,8,0,'0000/00/00 00:00:00:00','',0,'','',0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0 ,0,0,'','','','','','','','','','',X'5443503408',0,0,'',0,0 ,0,0,'','',0,0,0,0,0,0,0,0);

INSERT INTO [管理员preFS]([SpayClinic],[VaxClinic],[ShelterClinic],[DateModified],[preFIX],[UpdateCounter],[LockedRecs],[数据库],[自拍],[ MedCtrClinic],[OtherClinic],[Da2PPPx],[Da2PPEPx],[FVRCPPx],[FVRCPEPx],[FELVTPx],[FELVTEPx],[FELVVPx],[FELVVEPx],[HWTPx],[HWTEPx],[RabiesPx] [RabiesEPx],[FIVTest],[FIVTestE],[OnePlusChar],[XSHWMPx],[XSHWMEPx],[SHWMPx],[SHWMEPx],[MHWMPx],[MHWMEPx],[LHWMPx],[LHWMEPx],[ DebuggerOn],[PayThisAmount],[free6],[XSHWMPillPx],[XSHWMPillEPx],[SHWMPillPx],[SHWMPillEPx],[MHWMPillPx],[MHWMPillEPx],[LHWMPillPx],[LHWMPillEPx],[free7],[free8] [free9],[XSPMPx],[XSPMEPx],[SPMPx],[SPMEPx],[MPMPx],[MPMEPx],[LPMPx],[LPMEPx],[ReceiptFooter],[MonthsUntilBenefits],[free12],[ XSPMPillPx],[XSPMPillEPx],[SPMPillPx],[SPMPillEPx],[MPMPillPx],[MPMPillEPx],[LPMPillPx],[LPMPillEPx],[free14],[ClinicName],[ShelterName],[ShelterAbbr],[地址1] [地址2],[市],[状态],[邮编code],[MainPhone],[MainFax],[SplashPict],[free17],[free18],[LicenseNo],[系列号],[ free20],[free21],[free22],[VLogCC],[SNLogCC],[free23],[free24],[free25],[AgeAndBDay],[free26],[free27],[free28],[CurrRouteNum] )
VALUES
(26,5,0,'0000/00/00 00:00:00:00','',0,'','',0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0 ,0,0,'','','','','','','','','','',X'5443503408',0,0,'',0,0 ,0,0,'','',0,0,0,0,0,0,0,0),
(18,12,0,'0000/00/00 00:00:00:00','',0,'','',0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0 ,0,0,'','','','','','','','','','',X'5443503408',0,0,'',0,0 ,0,0,'','',0,0,0,0,0,0,0,0),
(9,10,0,'0000/00/00 00:00:00:00','',0,'','',0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0 ,0,0,'','','','','','','','','','',X'5443503408',0,0,'',0,0 ,0,0,'','',0,0,0,0,0,0,0,0),
(2,72,0,'0000/00/00 00:00:00:00','',0,'','',0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,'',0,0,0,0,0,0,0,0,0 ,0,0,'','','','','','','','','','',X'5443503408',0,0,'',0,0 ,0,0,'','',0,0,0,0,0,0,0,0);
;

静态无效的主要(字串[] args)
{
    查询字符串= @^ \(|;)(* \。?)$;

    VAR匹配= Regex.Matches(文字,查询,RegexOptions.Singleline | RegexOptions.Multiline);

    Console.WriteLine(预期匹配:8);
    Console.WriteLine(匹配找到:{0},matches.Count);

    到Console.ReadLine();
}
 

我的选择是完全一样的网站和我的code(多行和单行)任何他们应该使用相同的.NET正则表达式引擎,所以是什么原因造成了两者之间的区别是什么?


最终结果:

对于那些好奇我最后的正则表达式是

  @(小于?= ^ \()#线后跟一个的开始(
((('?(c为C> *)?'(')(= [\ S \)))|在SQL#文本字符串支持换行?!?
  (小于c取代;  - 〔。\ D \] +)| #任何数字
  (X'(小于?c取代; [0-9A-F] *)')#东西格式类似于X'0123456789abcdef'
  )(\ S \ S)?记录之间#空格和逗号
)+#重复图案的至少一个时间
(?=(小于!'')\);,] \ r $)#与结束行的结束)?;或),并不会立即通过'进行;
 

请注意所有这些计划使用的R和D(敲​​竹杠和部署)发展这仅适用于我的SQL,因为它是非常有规律。这将需要调整来处理,我并不需要处理,如果使用不是由我的第三方程序生成的SQL有很多优势的情况下。

下面是解析器解析code满code。希望这会帮助别人谁是停留在类似的东西。

 的foreach(在Directory.GetDirectories VAR tableFolder(_exportFolder))
{
    DataTable的// Popluate架构
    数据表表=新的DataTable();
    使用(SqlDataAdapter的ADA =新的SqlDataAdapter(的String.Format(选择顶层0 *从[{0}],Path.GetFileName(tableFolder)),美国康涅狄格州))
    {
        ada.Fill(表);
    }

    //所有的文件导入此表
    字符串[]文件路径= Directory.GetFiles(tableFolder,* .SQL);

    的foreach(在文件路径字符串的文件)
    {
        文本字符串;
        使用(VAR txtRdr =新的StreamReader(文件))
        {
            文= txtRdr.ReadToEnd();
        }

        常量字符串recordRegex ​​=
                        @(小于?= ^ \()一行接着是对#系统开始时(
                        ((('(< S>?*?)'?!('?)(= [\ S \)))|#格式化的东西像一些文本支持换行符
                            (n种GT;  - ?[\ D \] +)| #任何数字
                            (X'(小于?h取代; [0-9A-F] *)')#东西格式类似于X'0123456789abcdef'
                            )(\ S \ S)?记录之间#空格和逗号
                        )+#重复图案的至少一个时间
                        (?=(小于!'')\);,] \ r $)#与结束行的结束)?;或),并且不immedatly由''proceded;

        //创建每行一个匹配数据库
        VAR记录= Regex.Matches(文字,recordRegex,RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);

        常量字符串headerRegex ​​= @^ INSERT \ SINTO \ S \ [\ W _ \  -  \ S] + \] \ S \(\ S(?:\ [([\ W _ \  -  \ S] +)\] \秒(:,\ S'))+ \);?
        VAR标题= Regex.Match(文字,headerRegex).Groups [1] .Captures.Cast<捕获>()的ToArray();

        的foreach(在记录比赛记录)
        {
            //由于我们是如何捕获的3组,我们不得不把它们放回为了在一个列表中。
            VAR列= record.Groups.Cast<组>()
                                .Skip(1)//组[0] contins整条记录。
                                .SelectMany(组=> group.Captures.Cast<捕获>())//展平在一个列表中的所有三组的捕获的
                                .OrderBy(捕获=> capture.Index)//重新排列组合的名单中的SelectMany不会输出正确的顺序。
                                .ToArray();

            DataRow的行= table.NewRow();
            的for(int i = 0; I< columns.Length;我++)
            {
                键入columnType = table.Columns [标题[I]。价值] .DataType;
                如果(columnType == typeof运算(字符串))
                {
                    行[标题[I]。价值=列[I]。价值;
                }
                否则,如果(columnType == typeof运算(Int32)已)
                {
                    行[标题[I]。价值= Convert.ToInt32(列[I]。价值);
                }
                否则,如果(columnType == typeof运算(双人间))
                {
                    行[标题[I]。价值= Convert.ToDouble(列[I]。价值);
                }
                否则,如果(columnType == typeof运算(布尔))
                {
                    如果(列[I]。价值==0)
                        行[标题[I]。价值= FALSE;
                    否则如果(列[I]。价值==1)
                        行[标题[I]。价值] = TRUE;
                    其他
                        抛出新InvalidDataException();
                }
                否则,如果(columnType == typeof运算(Int16的))
                {
                    行[标题[I]。价值= Convert.ToInt16(列[I]。价值);
                }
                否则,如果(columnType == typeof运算(字节[]))
                {
                    行[标题[I]。价值= StringToByteArray(列[I]。价值);
                }
                其他
                {
                    抛出新的NotImplementedException();
                }

            }
            table.Rows.Add(行);
        }

        使用(VAR bulkCopy =新SqlBulkCopy的(康涅狄格州))
        {
            bulkCopy.DestinationTableName = Path.GetFileName(tableFolder);
            bulkCopy.BulkCopyTimeout = 0;
            bulkCopy.WriteToServer(表);
        }
    }
}
 


更新:

通过renameing的caputre组所有的同名.NET的正则表达式引擎结合了他们对我来说,它简化了

  VAR列= record.Groups [1] .Cast<组>()跳过(1).SelectMany(组=> group.Captures.Cast<捕获>() ).OrderBy(捕捉=> capture.Index).ToArray();
 

  VAR列= record.Groups [1] .Captures.Cast<捕获>()的ToArray();
 

解决方案

请注意,切换的CRLF标志着结束的行的正则表达式英雄页面上设置使8条线停下来是匹配的;这是一个线索是什么导致了问题。

在你的C#code,文字字符串中的换行符都设有codeD作为CR / LF对(\ r \ N )。该 $ 的正则表达式(匹配一个最终的线在多行模式)只匹配字符。于是,就有了最后一个逗号之间的一个额外的 \ r 字符(或分号),它的正则表达式不占,并匹配失败。

你能解决这个问题的一些方法包括:

  1. 剥去回车:文本= text.Replace(\ r \ N,\ N);
  2. 匹配回车:查询字符串= @^ \(|;)\ R $(* \。?);

I am trying to write out a regex to get each insert line from a SQL script. When I use the .NET Regex Tester on Regex Hero I get my expected 8 matches. However, when I run this snippit as a console app it returns no matches.

const string text =
@"INSERT INTO [AdminPrefs] ( [SpayClinic] , [VaxClinic] , [ShelterClinic] , [DateModified] , [Prefix] , [UpdateCounter] , [LockedRecs] , [dbName] , [Timer] , [MedCtrClinic] , [OtherClinic] , [Da2PPPx] , [Da2PPEPx] , [FVRCPPx] , [FVRCPEPx] , [FELVTPx] , [FELVTEPx] , [FELVVPx] , [FELVVEPx] , [HWTPx] , [HWTEPx] , [RabiesPx] , [RabiesEPx] , [FIVTest] , [FIVTestE] , [OnePlusChar] , [XSHWMPx] , [XSHWMEPx] , [SHWMPx] , [SHWMEPx] , [MHWMPx] , [MHWMEPx] , [LHWMPx] , [LHWMEPx] , [DebuggerOn] , [PayThisAmount] , [free6] , [XSHWMPillPx] , [XSHWMPillEPx] , [SHWMPillPx] , [SHWMPillEPx] , [MHWMPillPx] , [MHWMPillEPx] , [LHWMPillPx] , [LHWMPillEPx] , [free7] , [free8] , [free9] , [XSPMPx] , [XSPMEPx] , [SPMPx] , [SPMEPx] , [MPMPx] , [MPMEPx] , [LPMPx] , [LPMEPx] , [ReceiptFooter] , [MonthsUntilBenefits] , [free12] , [XSPMPillPx] , [XSPMPillEPx] , [SPMPillPx] , [SPMPillEPx] , [MPMPillPx] , [MPMPillEPx] , [LPMPillPx] , [LPMPillEPx] , [free14] , [ClinicName] , [ShelterName] , [ShelterAbbr] , [Address1] , [Address2] , [City] , [State] , [ZipCode] , [MainPhone] , [MainFax] , [SplashPict] , [free17] , [free18] , [LicenseNo] , [SerialNo] , [free20] , [free21] , [free22] , [VLogCC] , [SNLogCC] , [free23] , [free24] , [free25] , [AgeAndBDay] , [free26] , [free27] , [free28] , [CurrRouteNum] )
VALUES
(12 , 7 , 0 , '0000/00/00 00:00:00:00' , '' , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , '' , '' , '' , '' , '' , '' , '' , '' , '' , X'5443503408' , 0 , 0 , '' , 0 , 0 , 0 , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0),
(15 , 53 , 0 , '0000/00/00 00:00:00:00' , '' , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , '' , '' , '' , '' , '' , '' , '' , '' , '' , X'5443503408' , 0 , 0 , '' , 0 , 0 , 0 , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0),
(20 , 216 , 0 , '0000/00/00 00:00:00:00' , '' , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , '' , '' , '' , '' , '' , '' , '' , '' , '' , X'5443503408' , 0 , 0 , '' , 0 , 0 , 0 , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0),
(16 , 8 , 0 , '0000/00/00 00:00:00:00' , '' , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , '' , '' , '' , '' , '' , '' , '' , '' , '' , X'5443503408' , 0 , 0 , '' , 0 , 0 , 0 , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0);

INSERT INTO [AdminPrefs] ( [SpayClinic] , [VaxClinic] , [ShelterClinic] , [DateModified] , [Prefix] , [UpdateCounter] , [LockedRecs] , [dbName] , [Timer] , [MedCtrClinic] , [OtherClinic] , [Da2PPPx] , [Da2PPEPx] , [FVRCPPx] , [FVRCPEPx] , [FELVTPx] , [FELVTEPx] , [FELVVPx] , [FELVVEPx] , [HWTPx] , [HWTEPx] , [RabiesPx] , [RabiesEPx] , [FIVTest] , [FIVTestE] , [OnePlusChar] , [XSHWMPx] , [XSHWMEPx] , [SHWMPx] , [SHWMEPx] , [MHWMPx] , [MHWMEPx] , [LHWMPx] , [LHWMEPx] , [DebuggerOn] , [PayThisAmount] , [free6] , [XSHWMPillPx] , [XSHWMPillEPx] , [SHWMPillPx] , [SHWMPillEPx] , [MHWMPillPx] , [MHWMPillEPx] , [LHWMPillPx] , [LHWMPillEPx] , [free7] , [free8] , [free9] , [XSPMPx] , [XSPMEPx] , [SPMPx] , [SPMEPx] , [MPMPx] , [MPMEPx] , [LPMPx] , [LPMEPx] , [ReceiptFooter] , [MonthsUntilBenefits] , [free12] , [XSPMPillPx] , [XSPMPillEPx] , [SPMPillPx] , [SPMPillEPx] , [MPMPillPx] , [MPMPillEPx] , [LPMPillPx] , [LPMPillEPx] , [free14] , [ClinicName] , [ShelterName] , [ShelterAbbr] , [Address1] , [Address2] , [City] , [State] , [ZipCode] , [MainPhone] , [MainFax] , [SplashPict] , [free17] , [free18] , [LicenseNo] , [SerialNo] , [free20] , [free21] , [free22] , [VLogCC] , [SNLogCC] , [free23] , [free24] , [free25] , [AgeAndBDay] , [free26] , [free27] , [free28] , [CurrRouteNum] )
VALUES
(26 , 5 , 0 , '0000/00/00 00:00:00:00' , '' , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , '' , '' , '' , '' , '' , '' , '' , '' , '' , X'5443503408' , 0 , 0 , '' , 0 , 0 , 0 , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0),
(18 , 12 , 0 , '0000/00/00 00:00:00:00' , '' , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , '' , '' , '' , '' , '' , '' , '' , '' , '' , X'5443503408' , 0 , 0 , '' , 0 , 0 , 0 , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0),
(9 , 10 , 0 , '0000/00/00 00:00:00:00' , '' , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , '' , '' , '' , '' , '' , '' , '' , '' , '' , X'5443503408' , 0 , 0 , '' , 0 , 0 , 0 , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0),
(2 , 72 , 0 , '0000/00/00 00:00:00:00' , '' , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , '' , '' , '' , '' , '' , '' , '' , '' , '' , '' , X'5443503408' , 0 , 0 , '' , 0 , 0 , 0 , 0 , '' , '' , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0);
";

static void Main(string[] args)
{
    string query = @"^\(.*?\)(,|;)$";

    var matches = Regex.Matches(text, query, RegexOptions.Singleline | RegexOptions.Multiline);

    Console.WriteLine("Expected Matches: 8");
    Console.WriteLine("Matches Found: {0}", matches.Count);

    Console.ReadLine();
}

My options are exactly the same for the website and my code (Multiline and Singleline) any they should be using the same .NET regex engine, so what is causing the difference between the two?


Final Results:

For all those curious my final Regex was

@"(?<=^\()                       # The beginning of a line followed by a (
((('(?<c>.*?)'(?!')(?=[\s\)])) | # Text string in SQL supports line breaks
  (?<c>-?[\d\.]+) |              # Any numbers
  (X'(?<c>[0-9a-f]*)')           # Something formatted like X'0123456789abcdef'
  )(\s,\s)?                      # Spaces and commas between the records
)+                               # Repeat the pattern at least one time
(?=(?<!'')\)[;,]\r?$)            # The End of the line ending with ); or ), and not immediately proceeded by ''";     

Note to all those planning to use this for R&D (rip-off and deploy) development this only works for my SQL because it is very regular. It would require tweaking to handle many edge cases that I do not need to deal with if used with SQL that was not generated by my 3rd party program.

Here is the full code of the parsing code of the parser. Hopefully it will help someone else who is stuck on something similar.

foreach (var tableFolder in Directory.GetDirectories(_exportFolder))
{
    //Popluate the schema of the DataTable
    DataTable table = new DataTable();
    using (SqlDataAdapter ada = new SqlDataAdapter(String.Format("Select top 0 * from [{0}]", Path.GetFileName(tableFolder)), conn))
    {
        ada.Fill(table);
    }

    //All of the files to import for this table
    string[] filePaths = Directory.GetFiles(tableFolder, "*.sql");

    foreach (string file in filePaths)
    {
        string text;
        using (var txtRdr = new StreamReader(file))
        {
            text = txtRdr.ReadToEnd();
        }

        const string recordRegex =
                        @"(?<=^\()                       #The begining of a line followed by a (
                        ((('(?<s>.*?)'(?!')(?=[\s\)])) | # Something formatted like 'some text' supports line breaks
                            (?<n>-?[\d\.]+) |            # Any numbers
                            (X'(?<h>[0-9a-f]*)')         # Something formatted like X'0123456789abcdef'
                            )(\s,\s)?                    # Spaces and commas between the records
                        )+                               # Repeat the pattern at least one time
                        (?=(?<!'')\)[;,]\r?$)            # The End of the line ending with ); or ), and not immedatly proceded by ''";            

        //Creates one match per row in the database
        var records = Regex.Matches(text, recordRegex, RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);

        const string headerRegex = @"^INSERT\sINTO\s\[[\w_\-\s]+\]\s\(\s(?:\[([\w_\-\s]+)\]\s(?:,\s)?)+\)";
        var header = Regex.Match(text, headerRegex).Groups[1].Captures.Cast<Capture>().ToArray();

        foreach (Match record in records)
        {
            //Due to how we captured the 3 groups we had to put them back in order in one list.
            var columns = record.Groups.Cast<Group>()
                                .Skip(1)  //Groups[0] contins the entire record.
                                .SelectMany(group => group.Captures.Cast<Capture>()) //Flattens all of the captures in the three groups in to one list
                                .OrderBy(capture => capture.Index) //Reorder the combined list as the SelectMany will not be outputting the correct order.
                                .ToArray(); 

            DataRow row = table.NewRow();
            for (int i = 0; i < columns.Length; i++)
            {
                Type columnType = table.Columns[header[i].Value].DataType;
                if (columnType == typeof(String))
                {
                    row[header[i].Value] = columns[i].Value;
                }
                else if (columnType == typeof(Int32))
                {
                    row[header[i].Value] = Convert.ToInt32(columns[i].Value);
                }
                else if (columnType == typeof(Double))
                {
                    row[header[i].Value] = Convert.ToDouble(columns[i].Value);
                }
                else if (columnType == typeof(Boolean))
                {
                    if (columns[i].Value == "0")
                        row[header[i].Value] = false;
                    else if (columns[i].Value == "1")
                        row[header[i].Value] = true;
                    else
                        throw new InvalidDataException();
                }
                else if (columnType == typeof(Int16))
                {
                    row[header[i].Value] = Convert.ToInt16(columns[i].Value);
                }
                else if (columnType == typeof(Byte[]))
                {
                    row[header[i].Value] = StringToByteArray(columns[i].Value);
                }
                else
                {
                    throw new NotImplementedException();
                }

            }
            table.Rows.Add(row);
        }

        using (var bulkCopy = new SqlBulkCopy(conn))
        {
            bulkCopy.DestinationTableName = Path.GetFileName(tableFolder);
            bulkCopy.BulkCopyTimeout = 0;
            bulkCopy.WriteToServer(table);
        }
    }
}


Update:

By renameing the caputre groups to all the same name .NET's regex engine combines them for me, that simplifies

var columns = record.Groups[1].Cast<Group>().Skip(1).SelectMany(group => group.Captures.Cast<Capture>()).OrderBy(capture => capture.Index).ToArray();

to

var columns = record.Groups[1].Captures.Cast<Capture>().ToArray();

解决方案

Note that toggling the "CrLf marks a line ending" setting on the Regex Hero page causes the 8 lines to stop being matched; this is a clue as to what's causing the problem.

In your C# code, the line breaks within the literal string are encoded as a CR/LF pair ("\r\n"). The $ in the regex (that matches an end-of-line in Multiline mode) only matches the \n character. Thus, there is an extra \r character between the final comma (or semicolon) which the regex doesn't account for, and the match fails.

Some ways you could address this problem include:

  1. Strip the carriage returns: text = text.Replace("\r\n", "\n");, or
  2. Match the carriage returns: string query = @"^\(.*?\)(,|;)\r$";

这篇关于.NET正则表达式引擎返回任何比赛,但我期待8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆