C#将大型阵列序列化到磁盘 [英] C# serialize large array to disk
问题描述
我有一个很大的图形,它存储在一个一维数组中(大约1.1 GB),可以将其存储在运行Windows XP且具有2GB内存和2GB虚拟内存的计算机上的内存中.我能够在内存中生成整个数据集,但是当我尝试使用 BinaryFormatter
将其序列化到磁盘时,文件大小达到约50MB,然后出现内存不足异常.在所有较小的问题中,我使用的代码都是相同的:
I have a very large graph stored in a single dimensional array (about 1.1 GB) which I am able to store in memory on my machine which is running Windows XP with 2GB of ram and 2GB of virtual memory. I am able to generate the entire data set in memory, however when I try to serialize it to disk using the BinaryFormatter
, the file size gets to about 50MB and then gives me an out of memory exception. The code I am using to write this is the same I use amongst all of my smaller problems:
StateInformation[] diskReady = GenerateStateGraph();
BinaryFormatter bf = new BinaryFormatter();
using (Stream file = File.OpenWrite(@"C:\temp\states.dat"))
{
bf.Serialize(file, diskReady);
}
搜索算法非常轻巧,一旦存储在内存中,我就可以毫无问题地对该图进行搜索.
The search algorithm is very lightweight, and I am able to perform searches on this graph with no problems once it is in memory.
我真的有3个问题:
-
是否有更可靠的方法将大数据集写入磁盘.一世猜你可以定义大为数据集的大小接近可用内存量,虽然我不确定准确性如何即是.
Is there a more reliable way to write a large data set to disk. I guess you can define large as when the size of the data set approaches the amount of available memory, though I am not sure how accurate that is.
我应该转移到更多数据库吗?中心的方法?
Should I move to a more database centric approach?
有人能指出我一些吗?有关阅读部分内容的文献来自磁盘文件中的大数据集C#?
Can anyone point me to some literature on reading portions of a large data set from a disk file in C#?
推荐答案
编写条目以自己归档.一种简单的解决方案是:
Write entries to file yourself. One simple solution would be like:
StateInformation[] diskReady = GenerateStateGraph();
BinaryFormatter bf = new BinaryFormatter();
using (Stream file = File.OpenWrite(@"C:\temp\states.dat"))
{
foreach(StateInformation si in diskReady)
using(MemoryStream ms = new MemoryStream())
{
bf.Serialize(ms, diskReady);
byte[] ser = ms.ToArray();
int len = ser.Length;
file.WriteByte((byte) len & 0x000000FF);
file.WriteByte((byte) (len & 0x0000FF00) >> 8);
file.WriteByte((byte) (len & 0x00FF0000) >> 16);
file.WriteByte((byte) (len & 0x7F000000) >> 24);
file.Write(ser, 0, len);
}
}
一次只需要一个StateInformation对象的内存,而要反序列化,则需要读取四个字节,构造长度,创建该大小的缓冲区,填充并反序列化.
No more than the memory for a single StateInformation object's memory is needed at a time, and to deserialise you read four bytes, construct the length, create a buffer of that size, fill it, and deserialise.
如果您创建更专业的格式,则可以针对速度,内存使用和磁盘大小对上述所有内容进行认真优化,但是以上内容展示了原理.
All of the above could be seriously optimised for speed, memory use and disk-size if you create a more specialised format, but the above goes to show the principle.
这篇关于C#将大型阵列序列化到磁盘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!