谷物和Boost序列化是否使用零拷贝? [英] Do cereal and Boost Serialization use zero-copy?

查看:220
本文介绍了谷物和Boost序列化是否使用零拷贝?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在几种序列化协议之间进行了一些性能比较,包括FlatBuffers,Cap'n Proto,Boost序列化和谷类.所有测试都是用C ++编写的.

I have done some performance comparison between several serialization protocols, including FlatBuffers, Cap'n Proto, Boost serialization and cereal. All the tests are written in C++.

我知道FlatBuffers和Cap'n Proto使用零复制.使用零复制时,序列化时间为空,但序列化对象的大小更大.

I know that FlatBuffers and Cap'n Proto use zero-copy. With zero-copy, serialization time is null but size of serialized objects is bigger.

我认为谷物和Boost序列化没有使用零拷贝.但是,序列化时间(对于int和double)几乎为零,并且序列化对象的大小几乎与Cap'n Proto或Flatbuffers的大小相同.我没有在他们的文档中找到有关零复制的任何信息.

I thought that cereal and Boost serialization didn't use zero-copy. However, serialization time (for int and double) is nearly null, and size of serialized objects is nearly the same as Cap'n Proto or Flatbuffers ones. I didn't find any information about zero-copy in their documentations.

谷物和Boost序列化也使用零拷贝吗?

Do cereal and Boost serialization use zero-copy too ?

推荐答案

Boost和Cereal不会在Cap'n Proto或Flatbuffers的意义上实现零复制.

Boost and Cereal do not implement zero-copy in the sense of Cap'n Proto or Flatbuffers.

通过真正的零副本序列化,活内存中对象的后备存储实际上与传递给read()write()系统调用的存储段完全相同.完全没有包装步骤.

With true zero-copy serialization, the backing store for your live in-memory objects is in fact exactly the same memory segment that is passed to the read() or write() system calls. There is no packing/unpacking step at all.

通常,这具有许多含义:

Generally, this has a number of implications:

  • 不使用new/delete分配对象.构造消息时,首先分配消息,这将为消息内容分配一个长的连续内存空间.然后,您直接在消息内部分配消息结构 ,接收实际上指向消息内存的指针.以后写消息时,单个write()调用会将整个内存空间推到线路上.
  • 同样,当您读入一条消息时,单个read()调用(或者可能是2-3)将整个消息读入一个内存块中.然后,您将获得一个指向消息根"的指针(或类似指针的对象),可用于遍历消息.请注意,在您的应用程序遍历消息之前,不会真正检查消息的任何部分.
  • 对于普通套接字,数据的唯一副本位于内核空间中.借助RDMA网络,您甚至可以避免内核空间复制:数据从线路直接传送到其最终存储位置.
  • 使用文件(而不是网络)时,可以直接从磁盘mmap()发送非常大的消息,并直接使用映射的内存区域.这样做是O(1)-文件的大小无关紧要.实际访问它们时,您的操作系统将自动分页文件的必要部分.
  • 同一台计算机上的两个进程可以通过没有副本的共享内存段进行通信.请注意,通常,常规的旧C ++对象在共享内存中不能很好地工作,因为内存段通常在两个内存空间中都没有相同的地址,因此所有指针都是错误的.在零拷贝序列化框架中,指针通常表示为偏移量而不是绝对地址,因此它们与位置无关.
  • Objects are not allocated using new/delete. When constructing a message, you allocate the message first, which allocates a long contiguous memory space for the message contents. You then allocate the message structure directly inside the message, receiving pointers that in fact point into the message's memory. When the message is later written, a single write() call shoves this whole memory space out to the wire.
  • Similarly, when you read in a message, a single read() call (or maybe 2-3) reads in the entire message into one block of memory. You then get a pointer (or, a pointer-like object) to the "root" of the message, which you can use to traverse it. Note that no part of the message is actually inspected until your application traverses it.
  • With normal sockets, the only copies of your data happen in kernel space. With RDMA networking, you may even be able to avoid kernel-space copies: the data comes off the wire directly into its final memory location.
  • When working with files (rather than networks) it's possible to mmap() a very large message directly from disk and use the mapped memory region directly. Doing so is O(1) -- it doesn't matter how big the file is. Your operating system will automatically page in the necessary parts of the file when you actually access them.
  • Two processes on the same machine can communicate through shared memory segments with no copies. Note that, generally, regular old C++ objects do not work well in shared memory, because the memory segments usually don't have the same address in both memory spaces, thus all the pointers are wrong. With a zero-copy serialization framework, the pointers are usually expressed as offsets rather than absolute addresses, so that they are position-independent.

Boost和Cereal有所不同:在这些系统中收到消息时,首先对整个消息执行一次传递,以解包"内容.数据的最终存放位置是使用new/delete以传统方式分配的对象.类似地,在发送消息时,必须从该对象树中收集数据并将其打包在一起放入一个缓冲区中才能被写出.即使Boost和Cereal是可扩展的",但要实现真正的零复制还需要一个截然不同的底层设计.不能用螺栓将其作为扩展.

Boost and Cereal are different: When you receive a message in these systems, first a pass is performed over the entire message to "unpack" the contents. The final resting place of the data is in objects allocated in the traditional way using new/delete. Similarly, when sending a message, the data has to be collected from this tree of objects and packed together into one buffer in order to be written out. Even though Boost and Cereal are "extensible", being truly zero-copy requires a very different underlying design; it cannot be bolted-in as an extension.

也就是说,不要以为零复制总是会更快. memcpy()可能非常快,并且程序的其余部分可能使成本相形见..同时,零拷贝系统倾向于具有不方便的API,特别是由于内存分配的限制.总的来说,使用传统的序列化系统可能会更好地利用您的时间.

That said, don't assume zero-copy will always be faster. memcpy() can be pretty fast, and the rest of your program may dwarf the cost. Meanwhile, zero-copy systems tend to have inconvenient APIs, particularly because of the restrictions on memory allocation. It may be overall a better use of your time to use a traditional serialization system.

零文件复制最明显的优势是在处理文件时,因为正如我提到的那样,您可以轻松地mmap()一个巨大的文件而只读取其中的一部分.非零拷贝格式根本无法做到这一点.但是,在联网方面,优势并不十分明显,因为网络通信本身必定为O(n).

The place where zero-copy is most obviously advantageous is when manipulating files, since as I mentioned you can easily mmap() a huge file and only read part of it. Non-zero-copy formats simply can't do that. When it comes to networking, though, the advantages are less clear, since the network communication itself is necessarily O(n).

最后,如果您真的想知道哪种序列化系统最适合您的用例,则可能需要全部尝试并进行评估.请注意,玩具基准测试通常会产生误导;您需要测试您的实际用例(或类似的用例)以获取有用的信息.

At the end of the day, if you really want to know which serialization system is fastest for your use case, you will probably need to try them all and measure them. Note that toy benchmarks are usually misleading; you need to test your actual use case (or something very similar) to get useful information.

披露:我是Cap'n Proto(零拷贝序列化程序)和Protocol Buffers v2(流行的非零拷贝序列化程序)的作者.

这篇关于谷物和Boost序列化是否使用零拷贝?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆