正则表达式中的OutOfMemoryException处理大型文件时匹配 [英] OutOfMemoryException in Regex Matches when processing large files

查看:152
本文介绍了正则表达式中的OutOfMemoryException处理大型文件时匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从一个生产代码版本中获得了一个异常日志.

I've got an exception log from one of production code releases.

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Text.RegularExpressions.Match..ctor(Regex regex, Int32 capcount, String text, Int32 begpos, Int32 len, Int32 startpos)
   at System.Text.RegularExpressions.RegexRunner.InitMatch()
   at System.Text.RegularExpressions.RegexRunner.Scan(Regex regex, String text, Int32 textbeg, Int32 textend, Int32 textstart, Int32 prevlen, Boolean quick)
   at System.Text.RegularExpressions.Regex.Run(Boolean quick, Int32 prevlen, String input, Int32 beginning, Int32 length, Int32 startat)
   at System.Text.RegularExpressions.MatchCollection.GetMatch(Int32 i)
   at System.Text.RegularExpressions.MatchEnumerator.MoveNext()

它尝试处理的数据约为800KB.

The data it tries to process was about 800KB.

在我的本地测试中,它工作正常.您是否见过类似的行为,可能是什么原因?

In my local tests it works perfectly fine. Have you ever seen similar behaviour, what can be the cause?

在处理文本之前,我应该先分割文本,但显然在那种情况下,正则表达式可能不匹配,因为原始文件是从随机位置分割的.

Shall I split the text before processing it, but obviously in that case regex might not match because the original file split from a random place.

我的正则表达式:

我认为这种特殊的RegEx引起了问题,当我在隔离的环境中对其进行测试时,它立即吞噬了内存.

I think this particular RegEx is causing the problem, when I test it out in an isolated environment it's eating the memory instantly.

((?:( |\.\.|\.|""|'|=)[\/|\?](?:[\w#!:\.\?\+=&@!$'~*,;\/\(\)\[\]\-]|%[0-9a-f]{2})*)( |\.|\.\.|""|'| ))?

编辑

我在本地考试中错了.我正在加载一个大字符串,然后向其添加内容,这使.NET Framework头晕目眩,然后在RegEx而不是字符串操作过程中(或随机地,因此忽略了我之前说过的内容)给出了OOM异常.

I was being wrong with my local test. I was loading up a big string then appending stuff to it which makes .NET Framework dizzy and then give an OOM exception during the RegEx instead of during string operations (or randomly, so ignore the previous stuff I've said).

这是一个.NET Framework 2.0应用程序.

推荐答案

我不确定自己没有看到正则表达式,但是有时会遇到类似的问题,因为您的匹配是贪婪的,而不是懒惰的.

Without seeing your Regex, I don't know for sure but sometimes you can get problems like this because your matches are Greedy instead of Lazy.

Regex引擎必须在内部存储大量信息,并且贪婪的匹配最终会导致Regex选择您的800k字符串的大部分,很多次.

The Regex engine has to store lots of information internally and Greedy matches can end up causing the Regex to select large sections of your 800k string, many times over.

此处上有一些很好的信息.

There's some good information about this over here.

这篇关于正则表达式中的OutOfMemoryException处理大型文件时匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆