TStringList.LoadFromFile-大型文本文件的异常 [英] TStringList.LoadFromFile - Exceptions with Large Text Files

查看:501
本文介绍了TStringList.LoadFromFile-大型文本文件的异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行Delphi RAD Studio XE2.

I'm running Delphi RAD Studio XE2.

我有一些非常大的文件,每个文件包含很多行.线条本身很小-仅3个制表符分隔的双打.我想使用TStringList.LoadFromFile将文件加载到TStringList中,但这会引发大文件异常.

I have some very large files, each containing a large number of lines. The lines themselves are small - just 3 tab separated doubles. I want to load a file into a TStringList using TStringList.LoadFromFile but this raises an exception with large files.

对于200万行(约1GB)的文件,出现EIntOverflow异常.对于较大的文件(例如2000万行和大约10GB),出现ERangeCheck异常.

For files of 2 million lines (approximately 1GB) I get the EIntOverflow exception. For larger files (20 million lines and approximately 10GB, for example) I get the ERangeCheck exception.

我有32GB的RAM可玩,只是试图加载此文件并快速使用它.这是怎么回事,我还有什么其他选择?我可以使用带有大缓冲区的文件流将此文件加载到TStringList中吗?如果可以,请提供示例.

I have 32GB of RAM to play with and am just trying to load this file and use it quickly. What's going on here and what other options do I have? Could I use a file stream with a large buffer to load this file into a TStringList? If so could you please provide an example.

推荐答案

当Delphi在Delphi 2009中切换到Unicode时,TStrings.LoadFromStream()方法(TStrings.LoadFromFile()内部调用)对于非常效率低下.大型流/文件.

When Delphi switched to Unicode in Delphi 2009, the TStrings.LoadFromStream() method (which TStrings.LoadFromFile() calls internally) became very inefficient for large streams/files.

在内部,LoadFromStream()整个文件作为TBytes读入内存,然后使用TEncoding.GetString()将其转换为UnicodeString(将字节解码为TCharArray,将其复制到最终的UnicodeString中,然后释放该数组),然后解析UnicodeString(当TBytes仍在内存中时)根据需要将子字符串添加到列表中.

Internally, LoadFromStream() reads the entire file into memory as a TBytes, then converts that to a UnicodeString using TEncoding.GetString() (which decodes the bytes into a TCharArray, copies that into the final UnicodeString, and then frees the array), then parses the UnicodeString (while the TBytes is still in memory) adding substrings into the list as needed.

因此,在退出LoadFromStream()之前,内存中存在文件数据的四个副本-三个副本占用了更差的filesize * 3字节内存(每个副本都在使用它的拥有自己的连续内存块+一些MemoryMgr开销),并为解析的子字符串提供一个副本!当然,当LoadFromStream()实际退出时,将释放前三个副本.但这解释了为什么在达到该点之前会出现内存错误-LoadFromStream()试图使用3-4 GB的内存来加载1GB的文件,而RTL的内存管理器无法处理该错误.

So, just prior to LoadFromStream() exiting, there are four copies of the file data in memory - three copies taking up at worse filesize * 3 bytes of memory (where each copy is using its own contiguous memory block + some MemoryMgr overhead), and one copy for the parsed substrings! Granted, the first three copies are freed when LoadFromStream() actually exits. But this explains why you are getting memory errors before reaching that point - LoadFromStream() is trying to use 3-4 GB of memory to load a 1GB file, and the RTL's memory manger cannot handle that.

如果要将大文件的内容加载到TStringList中,最好使用TStreamReader而不是LoadFromFile(). TStreamReader使用缓冲文件I/O方法读取小块文件.只需循环调用其ReadLine()方法,即可Add()将每一行都添加到TStringList.例如:

If you want to load the content of a large file into a TStringList, you are better off using TStreamReader instead of LoadFromFile(). TStreamReader uses a buffered file I/O approach to read the file in small chunks. Simply call its ReadLine() method in a loop, Add()'ing each line to the TStringList. For example:

//MyStringList.LoadFromFile(filename);
Reader := TStreamReader.Create(filename, true);
try
  MyStringList.BeginUpdate;
  try
    MyStringList.Clear;
    while not Reader.EndOfStream do
      MyStringList.Add(Reader.ReadLine);
  finally
    MyStringList.EndUpdate;
  end;
finally
  Reader.Free;
end;

也许有一天,可能会这样重写LoadFromStream()以在内部使用TStreamReader.

Maybe some day, LoadFromStream() might be re-written to use TStreamReader internally like this.

这篇关于TStringList.LoadFromFile-大型文本文件的异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆