田protobuf网懒反序列化流 [英] Protobuf-net lazy streaming deserialization of fields

查看:136
本文介绍了田protobuf网懒反序列化流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

总体目标:要反序列化时跳过一个很长的领域,并在现场访问,直接从流无需加载整场读取它的元素。

Overall aim: To skip a very long field when deserializing, and when the field is accessed to read elements from it directly from the stream without loading the whole field.

示例类的对象被序列化/反序列化是 FatPropertyClass

Example classes The object being serialized/deserialized is FatPropertyClass.

[ProtoContract]
public class FatPropertyClass
{
    [ProtoMember(1)]
    private int smallProperty;

    [ProtoMember(2)]
    private FatArray2<int> fatProperty;

    [ProtoMember(3)]
    private int[] array;

    public FatPropertyClass()
    {

    }

    public FatPropertyClass(int sp, int[] fp)
    {
        smallProperty = sp;
        fatProperty = new FatArray<int>(fp);
    }

    public int SmallProperty
    {
        get { return smallProperty; }
        set { smallProperty = value; }
    }

    public FatArray<int> FatProperty
    {
        get { return fatProperty; }
        set { fatProperty = value; }
    }

    public int[] Array
    {
        get { return array; }
        set { array = value; }
    }
}


[ProtoContract]
public class FatArray2<T>
{
    [ProtoMember(1, DataFormat = DataFormat.FixedSize)]
    private T[] array;
    private Stream sourceStream;
    private long position;

    public FatArray2()
    {
    }

    public FatArray2(T[] array)
    {
        this.array = new T[array.Length];
        Array.Copy(array, this.array, array.Length);
    }


    [ProtoBeforeDeserialization]
    private void BeforeDeserialize(SerializationContext context)
    {
        position = ((Stream)context.Context).Position;
    }

    public T this[int index]
    {
        get
        {
            // logic to get the relevant index from the stream.
            return default(T);
        }
        set
        {
            // only relevant when full array is available for example.
        }
    }
}



我可以反序列化,像这样: FatPropertyClass D = model.Deserialize(FILESTREAM,空的typeof(FatPropertyClass),新SerializationContext(){上下文= FILESTREAM})作为FatPropertyClass; 其中模型可以是例如:

    RuntimeTypeModel model = RuntimeTypeModel.Create();
    MetaType mt = model.Add(typeof(FatPropertyClass), false);
    mt.AddField(1, "smallProperty");
    mt.AddField(2, "fatProperty");
    mt.AddField(3, "array");
    MetaType mtFat = model.Add(typeof(FatArray<int>), false);

这将跳过的反序列化阵列 FatArray< T> 。但是,我这时就需要在以后的时间来读取该数组随机元素。有一件事我想是在 FatArray2<的 BeforeDeserialize(SerializationContext上下文)方法反序列化之前,要记住流位置; T> 。正如上面的代码:位置=((流)context.Context).POSITION; 。然而,这似乎永远是流的结束。

This will skip the deserialization of array in FatArray<T>. However, I then need to read random elements from that array at a later time. One thing I tried is to remember the stream position before deserialization in the BeforeDeserialize(SerializationContext context) method of FatArray2<T>. As in the above code: position = ((Stream)context.Context).Position;. However this seems to always be the end of the stream.

我怎么能记得其中 FatProperty2 开始流位置,以及如何我可以从它在随机指标看?

How can I remember the stream position where FatProperty2 begins and how can I read from it at a random index?

注意:在 FatArray2<参数 T ; T> ; 可以是其他类型的标记的[ProtoContract] ,不只是原语。也有可能是类型的多个属性 FatProperty2< T> 在对象图的不同深度。

Note: The parameter T in FatArray2<T> can be of other types marked with [ProtoContract], not just primitives. Also there could be multiple properties of type FatProperty2<T> at various depths in the object graph.

方法2 :序列字段 FatProperty2< T> 的序列化后包含对象。因此,序列化 FatPropertyClass 长度前缀,然后用它包含长度前缀所有的脂肪阵列序列化。马克所有的属性,这些脂肪阵列属性,并在反序列化,我们可以记住他们每个人的流位置。

Method 2: Serialize the field FatProperty2<T> after the serialization of the containing object. So, serialize FatPropertyClass with length prefix, then serialize with length prefix all fat arrays it contains. Mark all of these fat array properties with an attribute, and at deserialization we can remember the stream position for each of them.

接下来的问题是我们如何读原语出来的吗?该工程确定使用 T项类= Serializer.DeserializeItems< T>(的SourceStream,PrefixStyle.Base128,Serializer.ListItemTag).Skip(指数)。取(1).ToArray(); 索引来获取项目首页。但这是如何工作的原语?原语数组似乎并不能够使用 DeserializeItems

Then the question is how do we read primitives out of it? This works OK for classes using T item = Serializer.DeserializeItems<T>(sourceStream, PrefixStyle.Base128, Serializer.ListItemTag).Skip(index).Take(1).ToArray(); to get the item at index index. But how does this work for primitives? An array of primitives does not seem to be able to be deserialized using DeserializeItems.

DeserializeItems 与LINQ使用这样即使好吗?它做什么,我认为它(在内部通过流以正确的元素跳跃 - 在最坏情况下读取每个长度前缀,跳过它)?

Is DeserializeItems with LINQ used like that even OK? Does it do what I assume it does (internally skip through the stream to the correct element - at worst reading each length prefix and skipping it)?

问候,
尤利安·

Regards, Iulian

推荐答案

这个问题要看的实际的型号非常多 - 它不是一个场景,图书馆专门针对制作方便。我怀疑,在这里你最好的选择将是编写使用 ProtoReader 手动读者。注意,这里的的一些技巧,当谈到阅读所选项目如果的最外层的的对象是列表< SOMETYPE> 或相似,但内部的对象通常是简单地读取或跳过。

This question depends an awful lot on the actual model - it isn't a scenario that the library specifically targets to make convenient. I suspect that your best bet here would be to write the reader manually using ProtoReader. Note that there are some tricks when it comes to reading selected items if the outermost object is a List<SomeType> or similar, but internal objects are typically either simply read or skipped.

这是通过 ProtoReader ,你可以寻求公平有效的第n项。我能做的以后,如果你喜欢(除非你确定它实际上是有用的我还没跃居)一个具体的例子。作为参考,流的位置是没有用这里的原因是:该库积极过度读取和缓冲数据,除非你明确告诉它,限制它的长度。这是因为像varint数据是很难不大量缓冲有效阅读,因为这将最终被很多 ReadByte个人电话(),而不是仅仅的与本地缓存的工作。

By starting again from the root of the document via ProtoReader, you could seek fairly efficiently to the nth item. I can do a concrete example later if you like (I haven't leapt in unless you're sure it will actually be useful). For reference, the reason the stream's position isn't useful here is: the library aggressively over-reads and buffers data, unless you specifically tell it to limit its length. This is because data like "varint" is hard to read efficiently without lots of buffering, as it would end up being a lot of individual calls to ReadByte(), rather than just working with a local buffer.

这是一个完全未经测试的阅读正版本从一个读者的直接子属性的个数组项;请注意,这将是低效的前一后调用这个很多次,但它应该是显而易见怎么改读的连续值范围的等:

This is a completely untested version of reading the n-th array item of the sub-property directly from a reader; note that it would be inefficient to call this lots of times one after the other, but it should be obvious how to change it to read a range of consecutive values, etc:

static int? ReadNthArrayItem(Stream source, int index, int maxLen)
{
    using (var reader = new ProtoReader(source, null, null, maxLen))
    {
        int field, count = 0;
        while ((field = reader.ReadFieldHeader()) > 0)
        {
            switch (field)
            {
                case 2: // fat property; a sub object
                    var tok = ProtoReader.StartSubItem(reader);
                    while ((field = reader.ReadFieldHeader()) > 0)
                    {
                        switch (field)
                        {
                            case 1: // the array field
                                if(count++ == index)
                                    return reader.ReadInt32();
                                reader.SkipField();
                                break;
                            default:
                                reader.SkipField();
                                break;
                        }
                    }
                    ProtoReader.EndSubItem(tok, reader);
                    break;
                default:
                    reader.SkipField();
                    break;
            }
        }
    }
    return null;
}



最后,请注意,如果这是一个大阵,你可能想使用压缩的数组(参见protobuf的文档,但是这主要存储他们没有每个项目的标头)。这将是一个很大更有效率,但要注意,它需要稍微不同的识别代码。启用加入打包数组 IsPacked = TRUE [ProtoMember(...)] 该阵列。

这篇关于田protobuf网懒反序列化流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆