更好/更快的方式来填补C#中的大数组 [英] Better/faster way to fill a big array in C#

查看:89
本文介绍了更好/更快的方式来填补C#中的大数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有3个* .dat文件(34​​6KB,725KB,1762KB),它们充满了一个JSON字符串大INT-阵列。

I have 3 *.dat files (346KB,725KB,1762KB) that are filled with a json-string of "big" int-Arrays.

每个我的对象被创建时(多次)我拿这三个文件,并使用 JsonConvert.DeserializeObject 来数组反序列化到对象。

Each time my object is created (several times) I take those three files and use JsonConvert.DeserializeObject to deserialize the arrays into the object.

我想过使用,而不是一个JSON字符串二进制文件或可我甚至直接保存这些阵列?我不需要使用这些文件,这只是目前的数据的保存位置。我很乐意切换到任何更快。

I thought about using binary-files instead of a json-string or could I even save these arrays directly? I dont need to use these files, it's just the location the data is currently saved. I would gladly switch to anything faster.

有哪些不同的方式来加快这些对象的初始化?

推荐答案

的最快方法是手动序列化数据。

The fastest way is to manually serialize the data.

这是简单的方法来做到这一点是通过创建一个FileStream,然后在一个的BinaryWriter / BinaryReader包装它。

An easy way to do this is by creating a FileStream, and then wrapping it in a BinaryWriter/BinaryReader.

您可以访问的功能写基本的数据结构(数字字符串字符字节[] 的char [] )。

You have access to functions to write the basic data structures (numbers, string, char, byte[] and char[]).

一个简单的方法写一个 INT [] (unneccesary如果是固定的大小)是prepending数组的长度不是一个int /长(根据不同的尺寸,无符号并没有真正给予任何好处,因为阵列使用的数据类型签订为它们的长度存储)。然后写所有的整数。

An easy way to write a int[] (unneccesary if it's fixed size) is by prepending the length of the array with either an int/long (depending on the size, unsigned doesn't really give any advantages, since arrays use signed datatypes for their length storage). And then write all the ints.

两种方式写的所有整数将是:结果
1.简单地循环整个数组。结果
2.将它转换成字节[] 并使用它写 BinaryWriter.Write(字节[])

Two ways to write all the ints would be:
1. Simply loop over the entire array.
2. Convert it into a byte[] and write it using BinaryWriter.Write(byte[])

这是如何实现他们两个:

These is how you can implement them both:

// Writing
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];

writer.Write(intArr.Length);
for (int i = 0; i < intArr.Length; i++)
    writer.Write(intArr[i]);

// Reading
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];

for (int i = 0; i < intArr.Length; i++)
    intArr[i] = reader.ReadInt32();

// Writing, method 2
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
byte[] byteArr = new byte[intArr.Length * sizeof(int)];
Buffer.BlockCopy(intArr, 0, byteArr, 0, intArr.Length * sizeof(int));

writer.Write(intArr.Length);
writer.Write(byteArr);

// Reading, method 2
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
byte[] byteArr = reader.ReadBytes(intArr.Length * sizeof(int));
Buffer.BlockCopy(byteArr, 0, intArr, 0, byteArr.Length);

我决定把所有这一切测试,有10000整数数组我跑测试10000次。

I decided to put this all to the test, with an array of 10000 integers I ran the test 10000 times.

这导致法我的一个系统(大约0.89ms)上的平均消耗888200ns。结果
虽然方法2只消耗我的系统上平均568600ns(0.57ms平均)。

It resulted in method one consumes averagely 888200ns on my system (about 0.89ms).
While method 2 only consumes averagely 568600ns on my system (0.57ms averagely).

这两次包括垃圾收集器做的工作。

Both times include the work the garbage collector has to do.

显然,方法2比1的方法速度快,尽管可能的可读性。

Obviously method 2 is faster than method 1, though possibly less readable.

另一个原因,方法1可能优于方法2是因为方法2需要比你打算写(原 INT [] 字节[] 就是从 INT [] ),用有限的RAM打交道时/非常大的转变文件(谈论512MB +),但如果是这样的话,你总是可以做一个混合解决方案,例如通过在同一时间写东西128MB。

Another reason why method 1 can be better than method 2 is because method 2 requires double the amount of RAM free than data you're going to write (the original int[] and the byte[] that's converted from the int[]), when dealing with limited RAM/extremely large files (talking about 512MB+), though if this is the case, you can always make a hybrid solution, by for example writing away 128MB at a time.

注意方法1也需要这个额外的空间,但由于它的每个项目分解下来1操作 INT [] ,它可以较早的大量释放内存

Note that method 1 also requires this extra space, but because it's split down in 1 operation per item of the int[], it can release the memory a lot earlier.

这样的事,会写的128MB的 INT [] 在一个时间:

Something like this, will write 128MB of an int[] at a time:

const int WRITECOUNT = 32 * 1024 * 1024; // 32 * sizeof(int)MB

int[] intArr = new int[140 * 1024 * 1024]; // 140 * sizeof(int)MB
for (int i = 0; i < intArr.Length; i++)
    intArr[i] = i;

byte[] byteArr = new byte[WRITECOUNT * sizeof(int)]; // 128MB

int dataDone = 0;

using (Stream fileStream = new FileStream("data.dat", FileMode.Create))
using (BinaryWriter writer = new BinaryWriter(fileStream))
{
    while (dataDone < intArr.Length)
    {
        int dataToWrite = intArr.Length - dataDone;
        if (dataToWrite > WRITECOUNT) dataToWrite = WRITECOUNT;
        Buffer.BlockCopy(intArr, dataDone, byteArr, 0, dataToWrite * sizeof(int));
        writer.Write(byteArr);
        dataDone += dataToWrite;
    }
}

请注意,这只是写作,阅读的作品也不同:P。
我希望这给你在处理非常大的数据文件,更深入的了解。)

Note that this is just for writing, reading works differently too :P. I hope this gives you some more insight in dealing with very large data files :).

这篇关于更好/更快的方式来填补C#中的大数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆