CapnProto最大文件大小 [英] CapnProto maximum filesize

查看:374
本文介绍了CapnProto最大文件大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我们正在使用ProtocolBuffer在python和C ++之间交换数据.但是,我们遇到了协议缓冲区的最大文件大小限制,并正在考虑将所有内容都切换到Cap'n Proto.但是,由于它与协议缓冲区有些相关,我想知道 Cap'n Proto是否也对最大文件大小有限制吗?

At the moment we are using ProtocolBuffers to exchange data between python and C++. However, we are running into the maximum filesize limitation of protocol buffers and are considering switching everything to Cap'n Proto. However, since it is somewhat related to protocol buffers, I was wondering if Cap'n Proto too has a limitation wrt to the maximum filesize?

推荐答案

Cap'n Proto的最大文件大小约为2 ^ 64字节,即16 exbibytes-对于任何人来说都足够". :)

Cap'n Proto has a maximum file size of approximately 2^64 bytes, or 16 exbibytes -- which "should be enough for anyone". :)

Cap'n Proto实际上是用于超大型数据文件的一种出色格式,因为它支持随机访问和延迟加载.当读取巨大的Cap'n Proto文件时,我建议使用mmap()将文件映射到内存,然后将字节直接传递到Cap'n Proto实现(例如C ++中的capnp::FlatArrayMessageReader).这样,操作系统只会将您实际使用的文件页面带入内存. (相反,使用协议缓冲区,必须先将整个文件解析为内存中的数据结构,然后才能访问其中的任何一个.)

Cap'n Proto is in fact an excellent format for extremely large data files, because it supports random access and lazy loading. When reading a huge Cap'n Proto file, I recommend using mmap() to map the file into memory, then passing the bytes directly to the Cap'n Proto implementation (e.g. capnp::FlatArrayMessageReader in C++). This way, only the pages of the file that you actually use will be brought into memory by the operating system. (In contrast, with Protocol Buffers, it is necessary to parse the entire file upfront into in-memory data structures before you can access any of it.)

请注意,Cap'n Proto结构中的单个List值限制为2 ^ 29-1个元素. TextData(字符串和字节blob)是特殊的列表,因此这意味着任何单个连续文本或字节blob均限制为512MB.但是,您可以有多个这样的Blob,因此可以通过将较大的数据拆分为多个文件来将其存储到单个文件中.

Note that an individual List value in a Cap'n Proto structure has a limit of 2^29-1 elements. Text and Data (strings and byte blobs) are special kinds of lists, so this implies that any single contiguous text or byte blob is limited to 512MB. However, you can have multiple such blobs, so larger data can be stored into a single file by splitting it into pieces.

还请注意,默认情况下,大多数Cap'n Proto实现在读取Cap'n Proto结构时都会施加遍历限制",以防御包含指针循环的恶意数据.通常,此默认值为64MiB.对于较大的数据,您需要覆盖该限制-在C ++中,您需要将自定义ReaderOptions传递给MessageReader构造函数.

Note also that most Cap'n Proto implementations by default impose a "traversal limit" when reading a Cap'n Proto structure in order to defend against malicious data containing pointer loops. Typically this defaults to 64MiB. For larger data, you'll want to override the limit -- in C++, you'll want to pass a custom ReaderOptions to the MessageReader constructor.

这篇关于CapnProto最大文件大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆