结构快速序列化/反序列化 [英] Fast serialization/deserialization of structs
问题描述
我在仅由结构简单的对象结构表示地理数据的巨大入账金额。我所有的字段是值类型的
I have huge amont of geographic data represented in simple object structure consisting only structs. All of my fields are of value type.
public struct Child
{
readonly float X;
readonly float Y;
readonly int myField;
}
public struct Parent
{
readonly int id;
readonly int field1;
readonly int field2;
readonly Child[] children;
}
中的数据很好地分块到的一小部分父[]
-s。每个阵列包含几千父实例。我有太多的数据让所有在内存中,所以我需要这些块交换到磁盘来回。 (一个文件将导致约2-300KB)。
The data is chunked up nicely to small portions of Parent[]
-s. Each array contains a few thousands Parent instances. I have way too much data to keep all in memory, so I need to swap these chunks to disk back and forth. (One file would result approx. 2-300KB).
什么是序列化/反序列化父[]
到字节[]
为dumpint到磁盘和读回?关于速度的我特别感兴趣的快反序列化,写入速度不是关键的。
What would be the most efficient way of serializing/deserializing the Parent[]
to a byte[]
for dumpint to disk and reading back? Concerning speed, I am particularly interested in fast deserialization, write speed is not that critical.
请问简单的 BinarySerializer
不够好?
或者我应该使用 StructLayout $围绕破解C $ C>(见接受的答案)
?我不知道这将与 Parent.children
阵列领域的工作。
Would simple BinarySerializer
good enough?
Or should I hack around with StructLayout
(see accepted answer)? I am not sure if that would work with array field of Parent.children
.
更新:回应评论 - 是的,对象是不可改变的(代码更新),并确孩子
字段是不是值类型。 300KB听起来不是很多,但我有一个这样的文件不计其数,所以速度很重要。
UPDATE: Response to comments - Yes, the objects are immutable (code updated) and indeed the children
field is not value type. 300KB sounds not much but I have zillions of files like that, so speed does matter.
推荐答案
BinarySerializer是一个非常普遍的序列化。它不会执行,以及一个定制的实现。
BinarySerializer is a very general serializer. It will not perform as well as a custom implementation.
幸运的是你的,你的数据仅是结构的。这意味着你将能够修复structlayout儿童和刚刚位复制使用不安全的代码从一个字节你从磁盘中读取[]孩子阵列。
Fortunately for your, your data consists of structs only. This means that you will be able to fix a structlayout for Child and just bit-copy the children array using unsafe code from a byte[] you have read from disk.
对于父母是不那么容易,因为你需要分别对待孩子。我建议你使用不安全的代码从[]您阅读并分别反序列化儿童字节复制位拷贝的领域。
For the parents it is not that easy because you need to treat the children separately. I recommend you use unsafe code to copy the bit-copyable fields from the byte[] you read and deserialize the children separately.
你有没有考虑所有的孩子都映射到内存使用内存映射文件?你可以再重新使用操作系统的缓存设备,而不是处理阅读和写作在所有
Did you consider mapping all the children into memory using memory mapped files? You could then re-use the operating systems cache facility and not deal with reading and writing at all.
零拷贝反序列化儿童[]看起来像这样:
Zero-copy-deserializing a Child[] looks like this:
byte[] bytes = GetFromDisk();
fixed (byte* bytePtr = bytes) {
Child* childPtr = (Child*)bytePtr;
//now treat the childPtr as an array:
var x123 = childPtr[123].X;
//if we need a real array that can be passed around, we need to copy:
var childArray = new Child[GetLengthOfDeserializedData()];
for (i = [0..length]) {
childArray[i] = childPtr[i];
}
}
这篇关于结构快速序列化/反序列化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!