用于从流中读取多个 protobuf 消息的 python 示例 [英] python example for reading multiple protobuf messages from a stream
问题描述
我正在处理来自 spinn3r 的数据,它由序列化为字节流的多个不同的 protobuf 消息组成:
I'm working with data from spinn3r, which consists of multiple different protobuf messages serialized into a byte stream:
http://code.google.com/p/spinn3r-client/wiki/Protostream一个>
protostream 是协议缓冲区消息的流,根据 Google 协议缓冲区规范在网络上编码为长度前缀 varint.该流由三部分组成:标头、有效载荷和尾标记."
"A protostream is a stream of protocol buffer messages, encoded on the wire as length prefixed varints according to the Google protocol buffer specification. The stream has three parts: a header, the payload, and a tail marker."
这似乎是 protobufs 的一个非常标准的用例.事实上,protobuf 核心发行版为 C++ 和 Java 都提供了 CodedInputStream.但是,protobuf 似乎没有为 python 提供这样的工具——内部"工具不是为这种外部使用设置的:
This seems like a pretty standard use case for protobufs. In fact, protobuf core distribution provides CodedInputStream for both C++ and Java. But, it appears that protobuf does not provide such a tool for python -- the 'internal' tools are not setup for this kind of external use:
https://groups.google.com/forum/?fromgroups#!topic/protobuf/xgmUqXVsK-o
所以...在我开始拼凑一个python varint解析器和用于解析不同消息类型流的工具之前:有人知道任何工具吗?
So... before I go and cobble together a python varint parser and tools for parsing a stream of different message types: does anyone know of any tools for this?
为什么 protobuf 中缺少它?(还是我没找到?)
Why is it missing from protobuf? (Or am I just failing to find it?)
这对于 protobuf 来说似乎是一个很大的差距,尤其是与 thrift 的传输"和协议"等价工具相比时.我看对了吗?
This seems like a big gap for protobuf, especially when compared to thrift's equivalent tools for both 'transport' and 'protocol'. Am I viewing that correctly?
推荐答案
看起来另一个答案中的代码可能来自 62dbec86"breel.在使用这个文件之前检查许可证,但我设法让它使用如下代码读取 varint32
s:
It looks like the code in the other answer is potentially lifted from here. Check the licence before using this file but I managed to get it to read varint32
s using code such as this:
import sys
import myprotocol_pb2 as proto
import varint # (this is the varint.py file)
data = open("filename.bin", "rb").read() # read file as string
decoder = varint.decodeVarint32 # get a varint32 decoder
# others are available in varint.py
next_pos, pos = 0, 0
while pos < len(data):
msg = proto.Msg() # your message type
next_pos, pos = decoder(data, pos)
msg.ParseFromString(data[pos:pos + next_pos])
# use parsed message
pos += next_pos
print "done!"
这是一个非常简单的代码,旨在加载由 varint32
分隔的单一类型的消息,描述下一条消息的大小.
This is very simple code designed to load messages of a single type delimited by varint32
s which describe the next message's size.
更新:也可以使用以下方法直接从 protobuf 库中包含此文件:
Update: It may also be possible to include this file directly from the protobuf library by using:
from google.protobuf.internal.decoder import _DecodeVarint32
这篇关于用于从流中读取多个 protobuf 消息的 python 示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!