C#将大型阵列序列化到磁盘 [英] C# serialize large array to disk

查看:45
本文介绍了C#将大型阵列序列化到磁盘的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的图形,它存储在一个一维数组中(大约1.1 GB),可以将其存储在运行Windows XP且具有2GB内存和2GB虚拟内存的计算机上的内存中.我能够在内存中生成整个数据集,但是当我尝试使用 BinaryFormatter 将其序列化到磁盘时,文件大小达到约50MB,然后出现内存不足异常.在所有较小的问题中,我使用的代码都是相同的:

I have a very large graph stored in a single dimensional array (about 1.1 GB) which I am able to store in memory on my machine which is running Windows XP with 2GB of ram and 2GB of virtual memory. I am able to generate the entire data set in memory, however when I try to serialize it to disk using the BinaryFormatter, the file size gets to about 50MB and then gives me an out of memory exception. The code I am using to write this is the same I use amongst all of my smaller problems:

StateInformation[] diskReady = GenerateStateGraph();
BinaryFormatter bf = new BinaryFormatter();
using (Stream file = File.OpenWrite(@"C:\temp\states.dat"))
{
    bf.Serialize(file, diskReady);
}

搜索算法非常轻巧,一旦存储在内存中,我就可以毫无问题地对该图进行搜索.

The search algorithm is very lightweight, and I am able to perform searches on this graph with no problems once it is in memory.

我真的有3个问题:

  1. 是否有更可靠的方法将大数据集写入磁盘.一世猜你可以定义大为数据集的大小接近可用内存量,虽然我不确定准确性如何即是.

  1. Is there a more reliable way to write a large data set to disk. I guess you can define large as when the size of the data set approaches the amount of available memory, though I am not sure how accurate that is.

我应该转移到更多数据库吗?中心的方法?

Should I move to a more database centric approach?

有人能指出我一些吗?有关阅读部分内容的文献来自磁盘文件中的大数据集C#?

Can anyone point me to some literature on reading portions of a large data set from a disk file in C#?

推荐答案

编写条目以自己归档.一种简单的解决方案是:

Write entries to file yourself. One simple solution would be like:

StateInformation[] diskReady = GenerateStateGraph();
BinaryFormatter bf = new BinaryFormatter();
using (Stream file = File.OpenWrite(@"C:\temp\states.dat"))
{
  foreach(StateInformation si in diskReady)
    using(MemoryStream ms = new MemoryStream())
    {
      bf.Serialize(ms, diskReady);
      byte[] ser = ms.ToArray();
      int len = ser.Length;
      file.WriteByte((byte) len & 0x000000FF);
      file.WriteByte((byte) (len & 0x0000FF00) >> 8);
      file.WriteByte((byte) (len & 0x00FF0000) >> 16);
      file.WriteByte((byte) (len & 0x7F000000) >> 24);
      file.Write(ser, 0, len);
    }
}

一次只需要一个StateInformation对象的内存,而要反序列化,则需要读取四个字节,构造长度,创建该大小的缓冲区,填充并反序列化.

No more than the memory for a single StateInformation object's memory is needed at a time, and to deserialise you read four bytes, construct the length, create a buffer of that size, fill it, and deserialise.

如果您创建更专业的格式,则可以针对速度,内存使用和磁盘大小对上述所有内容进行认真优化,但是以上内容展示了原理.

All of the above could be seriously optimised for speed, memory use and disk-size if you create a more specialised format, but the above goes to show the principle.

这篇关于C#将大型阵列序列化到磁盘的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆