C#正则表达式性能非常慢 [英] C# Regex Performance very slow

查看:92
本文介绍了C#正则表达式性能非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是正则表达式主题的新手.我想使用以下正则表达式解析日志文件:

 (?< time>(.*?))[|](?< placeholder4>(.*?))[|](?< source>(.*?))[|](?level [1-3])[|](?message>(.*?))[|] [|] [|](?< placeholder1>(.*?))[|] [|](?< placeholder2>(.*?))[|](?< placeholder3>(.*)) 

日志行如下所示:

 <代码> 2001.07.13 09:40:20 | 1 | SomeSection | 3 | ======一些日志消息:: Type:test = sdfsdf |||.\ SomeFile.cpp || 60 | -1 

带有appr的日志文件.3000行需要很长时间才能解析它.您是否有一些提示可以提高性能?谢谢...

更新:我使用正则表达式是因为我使用了不同的日志文件,这些文件的结构不同,并且使用这种方式:

  string [] fileContent = File.ReadAllLines(filePath);正则表达式模式=新正则表达式(LogFormat.GetLineRegex(logFileFormat));foreach(fileContent中的var行){//分割日志行匹配match = pattern.Match(line);字符串logDate = match.Groups ["time"].Value.Trim();字符串logLevel = match.Groups ["level"].Value.Trim();//等等...} 

解决方案:
谢谢你的帮助.我已经测试了以下结果:

1.)仅添加了RegexOptions.Compiled:
从00:01:10.9611143 到00:00:38.8928387

2.)二手Thomas Ayoub regex
从00:00:38.8928387到00:00:06.3839097

3.)二手WiktorStribiżewregex
从00:00:06.3839097 到00:00:03.2150095

解决方案

让我将我的评论转换"为答案,因为现在我知道您可以对正则表达式的性能做些什么.

然后,使用 RegexOptions.Compiled :

  Regex模式=新Regex(LogFormat.GetLineRegex(logFileFormat),RegexOptions.Compiled); 

I am very new in regex topic. I want to parse log files with following regex:

(?<time>(.*?))[|](?<placeholder4>(.*?))[|](?<source>(.*?))[|](?<level>[1-3])[|](?<message>(.*?))[|][|][|](?<placeholder1>(.*?))[|][|](?<placeholder2>(.*?))[|](?<placeholder3>(.*))

A log line looks like this:

2001.07.13 09:40:20|1|SomeSection|3|====== Some log message::Type: test=sdfsdf|||.\SomeFile.cpp||60|-1

A log file with appr. 3000 lines takes very long to parse it. Do you have some hints to speed up the performance? Thank you...

Update: I use regex because I use different log files which do not have the same structure and I use it that way:

string[] fileContent = File.ReadAllLines(filePath);
Regex pattern = new Regex(LogFormat.GetLineRegex(logFileFormat));

foreach (var line in fileContent)
{
   // Split log line
   Match match = pattern.Match(line);

   string logDate = match.Groups["time"].Value.Trim();
   string logLevel = match.Groups["level"].Value.Trim();
   // And so on...
}

Solution:
Thank you for help. I've tested it with following results:

1.) Only added RegexOptions.Compiled:
From 00:01:10.9611143 to 00:00:38.8928387

2.) Used Thomas Ayoub regex
From 00:00:38.8928387 to 00:00:06.3839097

3.) Used Wiktor Stribiżew regex
From 00:00:06.3839097 to 00:00:03.2150095

解决方案

Let me "convert" my comment into an answer since now I see what you can do about the regex performance.

As I have mentioned above, replace all .*? with [^|]*, and also all repeating [|][|][|] with [|]{3} (or similar, depending on the number of [|]. Also, do not use nested capturing groups, that also influences performance!

var logFileFormat = @"(?<time>[^|]*)[|](?<placeholder4>[^|]*)[|](?<source>[^|]*)[|](?<level>[1-3])[|](?<message>[^|]*)[|]{3}(?<placeholder1>[^|]*)[|]{2}(?<placeholder2>[^|]*)[|](?<placeholder3>.*)";

Only the last .* can remain "wildcardish" since it will grab the rest of the line.

Here is a comparison of your and my regex patterns at RegexHero.

Then, use RegexOptions.Compiled:

Regex pattern = new Regex(LogFormat.GetLineRegex(logFileFormat), RegexOptions.Compiled);

这篇关于C#正则表达式性能非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆