使用TStringList的奇怪的EOutOfMemory异常 [英] Strange EOutOfMemory exception using TStringList

查看:171
本文介绍了使用TStringList的奇怪的EOutOfMemory异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个系统,该系统加载一些文本文件,这些文本文件压缩为".log"文件,然后使用多个线程解析为信息类,每个线程处理一个不同的文件并将解析的对象添加到列表中. 该文件是使用TStringList加载的,因为它是我测试过的最快的方法.

I have a system that loads some text files that are zipped into a ".log" file and parse then into informational classes using multiple threads that each deals with a different file and adds the parsed objects to a list. The file is loaded using TStringList, since it was the fastest method that I tested.

文本文件的数量是可变的,但是通常我必须一次处理5到8个文件,范围从50Mb到120Mb.

The number of text files is variable but normally I have to deal with something between 5 to 8 files ranging from 50Mb to 120Mb in one incursion.

我的问题:用户可以按需要多次加载.log文件,并且在执行某些过程之后,当尝试使用TStringList.LoadFromFile时,我收到EOutOfMemory异常.当然,曾经使用过StringList的任何人首先想到的是,处理大型文本文件时不应使用它,但是这种异常是随机发生的,并且在该过程至少成功完成一次之后(在开始新的分析之前,对象已被销毁,因此除了一些较小的泄漏外,可以正确地检索内存)

My problem: The user can load the .log files as many times they desire, and after some of those processes I receive an EOutOfMemory exception when trying to use TStringList.LoadFromFile. Of course, the first thing that comes to mind to anyone that has ever used a StringList is that you should not use it when dealing with big textfiles, but this exception happens randomly and after the process has already been completed successfully at least once (the objects are destroyed before the start of a new parsing so the memory is retrieved correctly apart from some minor leaks)

我尝试使用Textile和TStreamReader,但是它不如TStringList快,并且该过程的持续时间是此功能的最大关注点.

I tried using textile and TStreamReader but it's not as fast as TStringList and the duration of the process is the greatest concern with this feature.

我使用的是10.1柏林,解析过程是一个简单的迭代,它遍历了各种长度的线,并基于线信息构造了对象.

I'm using 10.1 Berlin, the parse process is a simple iteration trough the list of varied length lines and construction of objects based on the line info.

本质上,我的问题是,这是什么原因造成的,我该如何解决.我可能会使用其他方式加载文件并读取其内容,但是它必须与TStringList方法一样快(或更好).

Essentially, my question is, what is causing this and how can i fix it. I may use other ways to load the file and read its contents but it must be as fast (or better) as the TStringList method.

加载线程执行代码:

TThreadFactory= class(TThread)
  protected
     // Class that holds the list of Commands already parsed, is owned outside of the thread
    _logFile: TLogFile;
    _criticalSection: TCriticalSection;
    _error: string;

    procedure Execute; override;
    destructor Destroy; override;

  public
    constructor Create(AFile: TLogFile; ASection: TCriticalSection); overload;

    property Error: string read _error;

  end;

implementation


{ TThreadFactory}

    constructor TThreadFactory.Create(AFile: TLogFile; ASection: TCriticalSection);
    begin
      inherited Create(True);
      _logFile := AFile;

      _criticalSection := ASection;
    end;


    procedure TThreadFactory.Execute;
        var
          tmpLogFile: TStringList;
          tmpConvertedList: TList<TLogCommand>;
          tmpCommand: TLogCommand;
          tmpLine: string;
          i: Integer;
        begin
          try
            try
              tmpConvertedList:= TList<TLogCommand>.Create;       

                if (_path <> '') and not(Terminated) then
                begin

                  try
                    logFile:= TStringList.Create;
                    logFile.LoadFromFile(tmpCaminho);

                    for tmpLine in logFile do
                    begin
                      if Terminated then
                        Break;

                      if (tmpLine <> '') then
                      begin
                        // the logic here was simplified that's just that 
                        tmpConvertedList.Add(TLogCommand.Create(tmpLine)); 
                      end;
                    end;
                  finally
                    logFile.Free;
                  end;

                end;


              _cricticalSection.Acquire;

              _logFile.AddCommands(tmpConvertedList);
            finally
              _cricticalSection.Release;

              FreeAndNil(tmpConvertedList);    
            end;
          Except
            on e: Exception do
              _error := e.Message;
          end;
        end;

    end.     

添加:感谢您的所有反馈.我将解决一些已经讨论过的问题,但是我在最初的问题中没有提及.

Added: Thank you for all your feedback. I will address some issues that were discussed but I failed to mention in my initial question.

  • .log文件内部具有.txt文件的多个实例,但是它也可以具有多个.log文件,每个文件代表一天的日志记录或用户选择的时间段,因为解压缩会花费很多时间每次发现.txt时都会启动线程的时间,因此我可以立即开始解析,这缩短了用户的明显等待时间

  • The .log file has multiple instances of .txt files inside but it can also have multiple .log files, each file represents a day worth of logging or a period selected by the user, since the decompression takes a lot of time a thread is started every time a .txt is found so I can start parsing immediately, this has shortened the noticeable waiting time for the user

ReportMemoryLeaksOnShutdown不会显示较小的泄漏",而TStreamReader之类的其他方法可避免此问题

The "minor leaks" are not shown by ReportMemoryLeaksOnShutdown and other methods like TStreamReader avoid this issue

命令列表由TLogFile保存.任何时候此类只有一个实例,每当用户要加载.log文件时,该实例都会被销毁. 所有线程都将命令添加到同一对象,这就是关键部分的原因.

The list of commands is held by TLogFile. There is only one instance of this class at any time and is destroyed whenever the user wants to load a .log file. All threads add commands to the same object, that's the reason for the critical section.

无法详细说明解析过程,因为它会公开一些明智的信息,但这是从字符串和TCommand中收集的简单信息

Can't detail the parse process since it would disclose some sensible information, but it's a simple information gathering from the string and the TCommand

从一开始我就知道碎片,但是我从未找到具体的证据证明TStringList仅通过多次加载才导致碎片,如果可以确认的话,我将非常高兴

Since the beginning I was aware of fragmentation but I never found concrete proof that TStringList causes the fragmentation only by loading multiple times, if this can be confirmed I would be very glad

感谢您的关注.我最终使用了一个外部库,该库能够读取行并以与TStringList相同的速度加载文件,而无需将整个文件加载到内存中

Thank you for you attention. I ended up using an external library that was capable of reading lines and loading files with the same speed as TStringList without the need to load the whole file into memory

https://github.com/d-mozulyov/CachedTexts/tree/master/lib

推荐答案

  1. TStringList本身是慢类.它有很多铃铛和口哨声",这些额外的特性和功能使它陷入困境.更快的容器将是TList<String>或普通的旧动态array of string.请参见System.IOUTils.TFile.ReadAllLines函数.

  1. TStringList is slow class per se. It has a lot of -bells and whistles- extra features and functions that bog it down. Much faster containers would be TList<String> or plain old dynamic array of string. See System.IOUTils.TFile.ReadAllLines function.

了解有关堆内存碎片的信息,例如 http://en.wikipedia.org/Heap_fragmentation

Read about Heap Memory Fragmentation, for example http://en.wikipedia.org/Heap_fragmentation

即使没有内存泄漏,它也可能发生并破坏您的应用程序. 但是,由于您说的是许多小泄漏,所以很可能会发生这种情况.通过避免将整个文件读入内存并使用较小的块,您可以或多或少地延迟崩溃.但是降级仍然会继续,甚至更慢,最终您的程序将再次崩溃.

It can happen and break your application even without memory leaks. But since you say there are many small leaks - that is what most probably happen. You can more or less delay the crash by avoiding reading whole files into memory and operating with smaller chunks. But degradation would still go on, even slower, and in the end your program would crash again.

  1. 有很多特殊的类库,它们逐个读取大文件,并进行缓冲,预取和不进行读取.针对文本的此类库之一是 http://github.com/d-mozulyov/CachedTexts ,还有其他内容.
  1. There are a lot of ad hoc classes libraries, reading large files piece after piece with buffering, pre-fetching and what not. One of such kind of libraries, targeted at texts, is http://github.com/d-mozulyov/CachedTexts and there are others too.

PS.一般说明.

我认为您的团队应该重新考虑您对多线程的需求. 坦白说,我什么也看不到. 您正在从HDD加载文件,可能已将经过处理和转换的文件写入同一HDD(最好是另一个HDD). 这意味着,程序速度受磁盘速度的限制.而且该速度比CPU和RAM的速度小得多. 通过引入多线程,您似乎只会使您的程序更加复杂和脆弱.错误很难检测,众所周知的库在MT模式下可能会突然出现异常,等等.由于瓶颈处在磁盘I/O速度上,因此您可能无法获得任何性能提升.

I think your team should reconsider how much need for multithreading you have. Frankly, I see none. You are loading files from HDD and probably you write processed and transformed files to the same (at best to some another) HDD. That means, your program speed is limited with disk speed. And that speed is MUCH less than speeds of CPU and RAM. By introducing multithreading you seem only to make your program more complex and fragile. Errors are much harder to detect, well known libraries may suddenly misbehave in MT mode, etc. And you probably get no performance increase, because the bottleneck is at disk I/O speed.

如果您仍然希望使用多线程,那么可以考虑使用OmniThreading库.它旨在简化MT应用程序的数据流"类型的开发.阅读教程和示例.

If you still want multithreading for the sake of it - then perhaps look into OmniThreading Library. It was designed to simplify developing "data streams" types of MT applications. Read the tutorials and examples.

我绝对建议您压缩所有这些一些较小的泄漏",并将其作为修复所有编译警告的一部分.我知道,当您不是项目中唯一的程序员,而其他人不在乎时,这很难. 仍然是少量泄漏",这意味着您的团队中没有人知道该程序的实际行为或行为.而且,多线程环境中的不确定性随机行为可以轻松生成大量随机Shroeden错误,您将永远无法复制和修复这些错误.

I definitely suggest you to squash all those "some minor leaks" and as part of it to fix all compilation warnings. I know, it is hard when you are not the only programmer at the project and others do not care. Still "minor leaks" means none on your team knows how the program actually behaves or behaved. And non-deterministic random behavior in multi-threading environment can easily generate tonnes of random Shroeden-bugs which you would never be able to reproduce and fix.

您的try-finally模式确实坏了. 在finally块中清除的变量应该在try块之前而不是在其中分配!

Your try-finally pattern really is broken. The variable you clean up in finally block should be assigned right before try block, not within it!

o := TObject.Create;
try
  ....
finally
  o.Destroy;
end;

这是正确的方法:

  • 该对象创建失败-不会输入try块,也不会最终阻塞.
  • 或成功创建对象-然后输入try-block,最后进入块

所以有时候

o := nil;
try
  o := TObject.Create;
  ....
finally
  o.Free;
end;

这也是正确的.在输入尝试块之前,立即将变量设置为nil .如果对象创建失败,则当finally-blocks调用Free方法时,已经分配了变量,并且TObject.Free(但不是TObject.Destroy)被设计为能够在nil对象引用上工作.就其本身而言,它只是第一个的嘈杂的,过于冗长的修改,但它为更多的派生类提供了基础.

This is also correct. The variable is set to be nil immediately before try-block is entered. If object creation fails, then when finally-blocks calls Free method the variable was already assigned, and TObject.Free (but NOT TObject.Destroy) was designed to be able to work on nil object references. By itself is just a noisy, overly verbose modification of the first one, but it serves as a foundation to few more derivatives.

当您不知道是否创建对象时,可以使用该模式.

That pattern may be used when you do not know would you create an object or not.

o := nil;
try
  ...
  if SomeConditionCheck() 
     then o := TObject.Create;  // but maybe not
  ....
finally
  o.Free;
end;

或者由于创建对象而需要计算一些数据,或者由于对象非常重(例如,全局阻止对某些文件的访问),因此创建对象的时间延迟时,您要努力使其生存期尽可能短.

Or when object creation is delayed, because you need to calculate some data for its creation, or because the object is very heavy (for example globally blocking access to some file) so you strive to keep its lifetime as short as possible.

o := nil;
try
  ...some code that may raise errors
  o := TObject.Create; 
  ....
finally
  o.Free;
end;

尽管该代码询问为什么所说的"...某些代码"没有在try块之前和之外移动.通常可以而且应该如此.一种相当罕见的模式.

That code though asks why the said "...some code" was not moved outside and before the try-block. Usually it can and should be. A rather rare pattern.

在创建多个对象时,会使用该模式的另一种衍生形式;

One more derivative from that pattern is used when creating several objects;

o1 := nil;
o2 := nil;
o3 := nil;
try
  o2 := TObject.Create;
  o3 := TObject.Create;
  o1 := TObject.Create;
  ....
finally
  o3.Free;
  o2.Free;
  o1.Free;
end;

目标是,例如,如果o3对象创建失败,则o1将被释放并且o2未创建,并且finally块中的Free调用将知道它.

Goal is, if for example o3 object creation fails, then o1 would get freed and o2 was not created and the Free calls in finally-block would know it.

那是半正确的.假定销毁对象永远不会引发其自身的异常.通常,该假设是正确的,但并非总是如此. 无论如何,这种模式使您可以将几个try-finally块融合为一个,从而使源代码更短(更易于阅读和推理),执行速度也更快.通常,这也是相当安全的,但并非总是如此.

That is semi-correct. It is assumed that destructing objects would never raise its own exceptions. Usually that assumption is correct, but not always. Anyway, this pattern lets you fuse several try-finally blocks into one, which makes source code shorter (easier to read and reason about) and execution a little bit faster. Usually this is also reasonably safe, but not always.

现在该模式有两个典型的误用:

Now two typical misuses of the pattern:

o := TObject.Create;
..... some extra code here
try
  ....
finally
  o.Destroy;
end;

如果在对象创建和try-block之间的代码出现了一些错误-那么没有人可以释放该对象.您只是发生了内存泄漏.

If the code BETWEEN object creation and try-block raises some error - then there is no anyone to free that object. You just got a memory leak.

当您阅读Delphi的源代码时,您可能会发现类似的模式

When you read Delphi sources you see maybe there a similar pattern

with TObject.Create do
try
  ....some very short code
finally
  Destroy;
end;

对于所有不使用with构造的广泛热情,该模式可避免在对象创建和尝试保护之间添加额外的代码. with的典型缺陷包括可能的名称空间冲突和无法将此匿名对象作为参数传递给其他函数.

With all the wide-spread zeal against any use of with construct, this pattern precludes adding extra code between object creation and try-guarding. The typical with drawbacks - possible namespaces collision and inability to pass this anonymous object to other functions as an argument - are included.

另一个不幸的修改:

o := nil;
..... some extra code here
..... that does never change o value
..... and our fortuneteller warrants never it would become
..... we know it for sure
try
  ....
  o := TObject.Create;
  ....
finally
  o.Free;
end;

从技术上讲,这种模式是正确的,但在这种情况下比较脆弱. 您不会立即看到o := nil行和try块之间的链接. 在将来开发该程序时,您可能会轻易忘记它并引入错误:例如将try块复制粘贴/移动到另一个函数中,而忽略nil初始化.或扩展中间代码并使之使用该o的值(因此-更改).我有时会用一种情况,但是这种情况很少见,而且存在风险.

This pattern is technically correct, but rather fragile at that. You do not immediately see the link between o := nil line and the try-block. When you would develop the program in the future, you may easily forget it and introduce the errors: like copy-pasting/moving the try-block into another function and forgetting the nil-initializing. Or extending the in-between code and making it use (thus - change) value of that o. There is one case i sometimes use it, but it is very rare and comes with risks.

现在

...some random code here that does not
...initialize o variable, so the o contains
...random memory garbage here
try
  o := TObject.Create;
  ....
finally
  o.Destroy; // or o.Free
end;

这是您写很多东西而没有想到try-finally的工作原理以及发明它的原因. 问题很简单:当您输入try-block时,您的o变量是一个带有随机垃圾的容器.现在,当您尝试创建对象时,可能会遇到一些错误.然后怎样呢?然后进入finally块并调用(random-garbage).Free-它应该怎么做?会产生随机垃圾.

This is what you write a lot without thinking how try-finally works and why it was invented. The problem is simple: when you enter the try-block your o variable is a container with random garbage. Now when you try to create the object, you may face some error raised. What then? Then you go into the finally-block and call (random-garbage).Free - and what should it do? It would do the random garbage.

因此,重复以上所有操作.

So, to repeat all the above.

  1. try-finally用于保证对象释放或任何其他变量清除(关闭文件,关闭窗口等),因此:
  2. 用于跟踪该资源(例如对象引用)的变量在try块的入口处应具有众所周知的值,应在try关键字之前为其分配(初始化)该值.如果您保护该文件-请在try之前立即将其打开.如果您防止内存泄漏-在try之前创建对象.等等.不要在try运算符之后执行我们的第一个初始化操作-在try-block内-在那儿为时已晚.
  3. 您最好将代码设计得尽可能简单(不言而喻),当您忘记了今天遗忘在脑海中的非明确隐性假设时,就消除了潜在的引入未来错误的可能性,并且可以跨过它们.请参阅谁写了此编程说明? 总是编码,好像最终维护您的代码的那个人将是一个暴力的精神病患者,知道您的住所." .这意味着在块开始之前,在try关键字的上方,初始化(分配)由try-block IMMEDIATELY保护的变量.更好的是,在该分配之前插入一个空行.让您(或任何其他读者)眼中这个变量和这次尝试是相互依赖的,切勿割裂.
  1. try-finally is used to warrant object freeing or any other variables cleanup (closing files, closing windows, etc), and hence:
  2. the variable used to track that resource (such as object reference) should have well known value on the entrance into the try-block, this it should be assigned (initialized) before the try keyword. If you guard the file - then open it immediately before try. If you guard against memory leak - create the object before try. Etc. Do not do our first initialization AFTER try operator - WITHIN try-block - it is too late there.
  3. you better design the code as simple (self-evident) as you can, eliminating potential to introduce future errors when you would forget non-explicit hidden assumptions you keep in the corner of your mind today - and would cross them. See Who wrote this programing saying? "Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." . Here it means, initialize (assign) the variable guarded by try-block IMMEDIATELY before the block starts, right above the try keyword. Better even, insert an empty line before that assignment. Make it jump into your (or any other reader's) eyes that this variable and this try are mutually dependent and should never be broken apart.

这篇关于使用TStringList的奇怪的EOutOfMemory异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆