如何使用 protobuf-net 序列化/反序列化大型项目列表 [英] How to serialize/deserialize large list of items with protobuf-net

查看:53
本文介绍了如何使用 protobuf-net 序列化/反序列化大型项目列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含大约 5 亿个项目的列表.如果我序列化单个项目而不是列表,我可以将其序列化为带有 protobuf-net 文件的文件——我无法将这些项目收集到价格列表中,然后因为内存不足而进行序列化.所以,我必须一次序列化一条记录:

I have a list of about 500 million items. I am able to serialize this into a file with protobuf-net file if I serialize individual items, not a list -- I cannot collect the items into List of Price and then serialize because I run out of memory. So, I have to serialize one record at a time:

using (var input = File.OpenText("..."))
using (var output = new FileStream("...", FileMode.Create, FileAccess.Write))
{
    string line = "";
    while ((line = input.ReadLine()) != null)
    {
        Price price = new Price();
        (code that parses input into a Price record)

        Serializer.Serialize(output, price);
    }
}

我的问题是关于反序列化部分.似乎 Deserialize 方法不会将流的位置移动到下一条记录.我试过了:

My question is about deserialization part. It appears that Deserialize method does not move the Position of the stream to the next record. I tried:

using (var input = new FileStream("...", FileMode.Open, FileAccess.Read))
{
    Price price = null;
    while ((price = Serializer.Deserialize<Price>(input)) != null)
    {
    }
}

我看到一个真实的 Price 记录,其余的都是空记录 -- 我取回了 Price 对象,但所有字段都被初始化为默认值.

I see one real-looking Price record, and then the rest are empty records -- I get the Price object back but all fields are initialized to default values.

如何正确反序列化包含未序列化为列表的对象列表的流?

How to properly deserialize a stream that contains a list of objects which are not serialized as a list?

推荐答案

好消息!protobuf-net API 正是针对这种情况设置的.您应该会看到一个 SerializeItems 和 DeserializeItems 方法对,它们与 IEnumerable 一起使用,允许流入和流出.向它提供枚举的最简单方法是通过源数据上的迭代器块".

Good news! The protobuf-net API is setup for exactly this scenario. You should see a SerializeItems and DeserializeItems pair of methods that work with IEnumerable<T>, allowing streaming both in and out. The easiest way to do feed it an enumerate is via an "iterator block" over the source data.

如果,无论出于何种原因,这不方便,则与在每个项目的基础上使用 SerializeWithLengthPrefix 和 DeserializeWithLengthPrefix 100% 相同,指定(作为参数)字段:1 和前缀样式:base-128.您甚至可以使用 SerializeWithLengthPrefix 进行写入,使用 DeserializeItems 进行读取(只要您使用字段 1 和 base-128).

If, for whatever reason, that isn't convenient, that is 100% identical to using SerializeWithLengthPrefix and DeserializeWithLengthPrefix on a per-item basis, specifying (as parameters) field: 1 and prefix-style: base-128. You could even use SerializeWithLengthPrefix for the writing, and DeserializeItems for the reading (as long as you use field 1 and base-128).

重新举例 - id 必须在完全可重现的场景中看到它才能发表评论;实际上,我期望的是,您只能返回一个对象,其中包含来自每个对象的组合值 - 因为没有长度前缀,protobuf 规范假设您只是将值连接到单个对象.上面提到的两种方法避免了这个问题.

Re the example - id have to see that in a fully reproducible scenario to comment; actually, what I would expect there is that you only get a single object back out, containing the combined values from each object - because without the length-prefix, the protobuf spec assumes you are just concatenating values to a single object. The two approaches mentioned above avoid this issue.

这篇关于如何使用 protobuf-net 序列化/反序列化大型项目列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆