更好/更快的方式来填补C#中的大数组 [英] Better/faster way to fill a big array in C#
问题描述
我有3个* .dat文件(346KB,725KB,1762KB),它们充满了一个JSON字符串大INT-阵列。
I have 3 *.dat files (346KB,725KB,1762KB) that are filled with a json-string of "big" int-Arrays.
每个我的对象被创建时(多次)我拿这三个文件,并使用 JsonConvert.DeserializeObject
来数组反序列化到对象。
Each time my object is created (several times) I take those three files and use JsonConvert.DeserializeObject
to deserialize the arrays into the object.
我想过使用,而不是一个JSON字符串二进制文件或可我甚至直接保存这些阵列?我不需要使用这些文件,这只是目前的数据的保存位置。我很乐意切换到任何更快。
I thought about using binary-files instead of a json-string or could I even save these arrays directly? I dont need to use these files, it's just the location the data is currently saved. I would gladly switch to anything faster.
有哪些不同的方式来加快这些对象的初始化?
推荐答案
的最快方法是手动序列化数据。
The fastest way is to manually serialize the data.
这是简单的方法来做到这一点是通过创建一个FileStream,然后在一个的BinaryWriter / BinaryReader包装它。
An easy way to do this is by creating a FileStream, and then wrapping it in a BinaryWriter/BinaryReader.
您可以访问的功能写基本的数据结构(数字
,字符串
,字符
,字节[]
和的char []
)。
You have access to functions to write the basic data structures (numbers
, string
, char
, byte[]
and char[]
).
一个简单的方法写一个 INT []
(unneccesary如果是固定的大小)是prepending数组的长度不是一个int /长(根据不同的尺寸,无符号并没有真正给予任何好处,因为阵列使用的数据类型签订为它们的长度存储)。然后写所有的整数。
An easy way to write a int[]
(unneccesary if it's fixed size) is by prepending the length of the array with either an int/long (depending on the size, unsigned doesn't really give any advantages, since arrays use signed datatypes for their length storage). And then write all the ints.
两种方式写的所有整数将是:结果
1.简单地循环整个数组。结果
2.将它转换成字节[]
并使用它写 BinaryWriter.Write(字节[])
Two ways to write all the ints would be:
1. Simply loop over the entire array.
2. Convert it into a byte[]
and write it using BinaryWriter.Write(byte[])
这是如何实现他们两个:
These is how you can implement them both:
// Writing
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
writer.Write(intArr.Length);
for (int i = 0; i < intArr.Length; i++)
writer.Write(intArr[i]);
// Reading
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
for (int i = 0; i < intArr.Length; i++)
intArr[i] = reader.ReadInt32();
// Writing, method 2
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
byte[] byteArr = new byte[intArr.Length * sizeof(int)];
Buffer.BlockCopy(intArr, 0, byteArr, 0, intArr.Length * sizeof(int));
writer.Write(intArr.Length);
writer.Write(byteArr);
// Reading, method 2
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
byte[] byteArr = reader.ReadBytes(intArr.Length * sizeof(int));
Buffer.BlockCopy(byteArr, 0, intArr, 0, byteArr.Length);
我决定把所有这一切测试,有10000整数数组我跑测试10000次。
I decided to put this all to the test, with an array of 10000 integers I ran the test 10000 times.
这导致法我的一个系统(大约0.89ms)上的平均消耗888200ns。结果
虽然方法2只消耗我的系统上平均568600ns(0.57ms平均)。
It resulted in method one consumes averagely 888200ns on my system (about 0.89ms).
While method 2 only consumes averagely 568600ns on my system (0.57ms averagely).
这两次包括垃圾收集器做的工作。
Both times include the work the garbage collector has to do.
显然,方法2比1的方法速度快,尽管可能的可读性。
Obviously method 2 is faster than method 1, though possibly less readable.
另一个原因,方法1可能优于方法2是因为方法2需要比你打算写(原 INT [] $ C $数据免费两倍的RAM数量C>和
字节[]
就是从 INT []
),用有限的RAM打交道时/非常大的转变文件(谈论512MB +),但如果是这样的话,你总是可以做一个混合解决方案,例如通过在同一时间写东西128MB。
Another reason why method 1 can be better than method 2 is because method 2 requires double the amount of RAM free than data you're going to write (the original int[]
and the byte[]
that's converted from the int[]
), when dealing with limited RAM/extremely large files (talking about 512MB+), though if this is the case, you can always make a hybrid solution, by for example writing away 128MB at a time.
注意方法1也需要这个额外的空间,但由于它的每个项目分解下来1操作 INT []
,它可以较早的大量释放内存
Note that method 1 also requires this extra space, but because it's split down in 1 operation per item of the int[]
, it can release the memory a lot earlier.
这样的事,会写的128MB的 INT []
在一个时间:
Something like this, will write 128MB of an int[]
at a time:
const int WRITECOUNT = 32 * 1024 * 1024; // 32 * sizeof(int)MB
int[] intArr = new int[140 * 1024 * 1024]; // 140 * sizeof(int)MB
for (int i = 0; i < intArr.Length; i++)
intArr[i] = i;
byte[] byteArr = new byte[WRITECOUNT * sizeof(int)]; // 128MB
int dataDone = 0;
using (Stream fileStream = new FileStream("data.dat", FileMode.Create))
using (BinaryWriter writer = new BinaryWriter(fileStream))
{
while (dataDone < intArr.Length)
{
int dataToWrite = intArr.Length - dataDone;
if (dataToWrite > WRITECOUNT) dataToWrite = WRITECOUNT;
Buffer.BlockCopy(intArr, dataDone, byteArr, 0, dataToWrite * sizeof(int));
writer.Write(byteArr);
dataDone += dataToWrite;
}
}
请注意,这只是写作,阅读的作品也不同:P。
我希望这给你在处理非常大的数据文件,更深入的了解。)
Note that this is just for writing, reading works differently too :P. I hope this gives you some more insight in dealing with very large data files :).
这篇关于更好/更快的方式来填补C#中的大数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!