更好/更快的方式来填补C＃中的大数组 [英] Better/faster way to fill a big array in C#

查看：89 发布时间：2016/6/2 21:47:44 c# arrays initialization

本文介绍了更好/更快的方式来填补C＃中的大数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有3个* .dat文件（346KB，725KB，1762KB），它们充满了一个JSON字符串大INT-阵列。

I have 3 *.dat files (346KB,725KB,1762KB) that are filled with a json-string of "big" int-Arrays.

每个我的对象被创建时（多次）我拿这三个文件，并使用 JsonConvert.DeserializeObject 来数组反序列化到对象。

Each time my object is created (several times) I take those three files and use JsonConvert.DeserializeObject to deserialize the arrays into the object.

我想过使用，而不是一个JSON字符串二进制文件或可我甚至直接保存这些阵列？我不需要使用这些文件，这只是目前的数据的保存位置。我很乐意切换到任何更快。

I thought about using binary-files instead of a json-string or could I even save these arrays directly? I dont need to use these files, it's just the location the data is currently saved. I would gladly switch to anything faster.

有哪些不同的方式来加快这些对象的初始化？

推荐答案

的最快方法是手动序列化数据。

The fastest way is to manually serialize the data.

这是简单的方法来做到这一点是通过创建一个FileStream，然后在一个的BinaryWriter / BinaryReader包装它。

An easy way to do this is by creating a FileStream, and then wrapping it in a BinaryWriter/BinaryReader.

您可以访问的功能写基本的数据结构（数字，字符串，字符，字节[] 和的char [] ）。

You have access to functions to write the basic data structures (numbers, string, char, byte[] and char[]).

一个简单的方法写一个 INT [] （unneccesary如果是固定的大小）是prepending数组的长度不是一个int /长（根据不同的尺寸，无符号并没有真正给予任何好处，因为阵列使用的数据类型签订为它们的长度存储）。然后写所有的整数。

An easy way to write a int[] (unneccesary if it's fixed size) is by prepending the length of the array with either an int/long (depending on the size, unsigned doesn't really give any advantages, since arrays use signed datatypes for their length storage). And then write all the ints.

两种方式写的所有整数将是：结果
1.简单地循环整个数组。结果
2.将它转换成字节[] 并使用它写 BinaryWriter.Write（字节[]）

Two ways to write all the ints would be:
1. Simply loop over the entire array.
2. Convert it into a byte[] and write it using BinaryWriter.Write(byte[])

这是如何实现他们两个：

These is how you can implement them both:

// Writing
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];

writer.Write(intArr.Length);
for (int i = 0; i < intArr.Length; i++)
    writer.Write(intArr[i]);

// Reading
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];

for (int i = 0; i < intArr.Length; i++)
    intArr[i] = reader.ReadInt32();

// Writing, method 2
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
byte[] byteArr = new byte[intArr.Length * sizeof(int)];
Buffer.BlockCopy(intArr, 0, byteArr, 0, intArr.Length * sizeof(int));

writer.Write(intArr.Length);
writer.Write(byteArr);

// Reading, method 2
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
byte[] byteArr = reader.ReadBytes(intArr.Length * sizeof(int));
Buffer.BlockCopy(byteArr, 0, intArr, 0, byteArr.Length);

我决定把所有这一切测试，有10000整数数组我跑测试10000次。

I decided to put this all to the test, with an array of 10000 integers I ran the test 10000 times.

这导致法我的一个系统（大约0.89ms）上的平均消耗888200ns。结果
虽然方法2只消耗我的系统上平均568600ns（0.57ms平均）。

It resulted in method one consumes averagely 888200ns on my system (about 0.89ms).
While method 2 only consumes averagely 568600ns on my system (0.57ms averagely).

这两次包括垃圾收集器做的工作。

Both times include the work the garbage collector has to do.

显然，方法2比1的方法速度快，尽管可能的可读性。

Obviously method 2 is faster than method 1, though possibly less readable.

另一个原因，方法1可能优于方法2是因为方法2需要比你打算写（原 INT [] 和字节[] 就是从 INT [] ），用有限的RAM打交道时/非常大的转变文件（谈论512MB +），但如果是这样的话，你总是可以做一个混合解决方案，例如通过在同一时间写东西128MB。


Another reason why method 1 can be better than method 2 is because method 2 requires double the amount of RAM free than data you're going to write (the original int[] and the byte[] that's converted from the int[]), when dealing with limited RAM/extremely large files (talking about 512MB+), though if this is the case, you can always make a hybrid solution, by for example writing away 128MB at a time.
注意方法1也需要这个额外的空间，但由于它的每个项目分解下来1操作 INT [] ，它可以较早的大量释放内存
Note that method 1 also requires this extra space, but because it's split down in 1 operation per item of the int[], it can release the memory a lot earlier.
这样的事，会写的128MB的 INT [] 在一个时间：
Something like this, will write 128MB of an int[] at a time:
const int WRITECOUNT = 32 * 1024 * 1024; // 32 * sizeof(int)MB

int[] intArr = new int[140 * 1024 * 1024]; // 140 * sizeof(int)MB
for (int i = 0; i < intArr.Length; i++)
    intArr[i] = i;

byte[] byteArr = new byte[WRITECOUNT * sizeof(int)]; // 128MB

int dataDone = 0;

using (Stream fileStream = new FileStream("data.dat", FileMode.Create))
using (BinaryWriter writer = new BinaryWriter(fileStream))
{
    while (dataDone < intArr.Length)
    {
        int dataToWrite = intArr.Length - dataDone;
        if (dataToWrite > WRITECOUNT) dataToWrite = WRITECOUNT;
        Buffer.BlockCopy(intArr, dataDone, byteArr, 0, dataToWrite * sizeof(int));
        writer.Write(byteArr);
        dataDone += dataToWrite;
    }
}

请注意，这只是写作，阅读的作品也不同：P。
我希望这给你在处理非常大的数据文件，更深入的了解。）
Note that this is just for writing, reading works differently too :P.
I hope this gives you some more insight in dealing with very large data files :).

                        这篇关于更好/更快的方式来填补C＃中的大数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

更好/更快的方式来填补C＃中的大数组 [英] Better/faster way to fill a big array in C#

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

更好/更快的方式来填补C＃中的大数组 [英] Better/faster way to fill a big array in C#

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭