串流与Cap'n'Proto流 [英] Stream while serializing with Cap'n'Proto

查看:1590
本文介绍了串流与Cap'n'Proto流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑一下这样的Cap'n'Proto模式:

Consider a Cap'n'Proto schema like this:

struct Document {
  header @0 : Header;
  records @1 :List(Record); // usually large number of records.
  footer @2 :Footer;
}
struct Header { numberOfRecords : UInt32; /* some fields */ };
struct Footer { /* some fields */ };
struct Record {
   type : UInt32;
   desc : Text;
   /* some more fields, relatively large in total */
}

现在我想序列化(即构建)一个文档实例并将其流传输到一个远程目标。

Now I want to serialize (i.e. build) a document instance and stream it to a remote destination.

由于文档通常很大,我不想完全构建它在内存中发送之前。相反,我正在寻找一个直接发送struct struct通过线的构建器。这样,额外需要的内存缓冲区是恒定的(即O(max(sizeof(Header),sizeof(Record),sizeof(Footer)))。

Since the document is usually very large I don't want to completely build it in memory before sending it. Instead I am looking for a builder that directly sends struct by struct over the wire. Such that the additional needed memory buffer is constant (i.e. O(max(sizeof(Header), sizeof(Record), sizeof(Footer))).

MallocMessageBuilder 似乎先在内存中创建所有内容(然后你调用 writeMessageToFd 上面)。

Looking at the tutorial material I don't find such a builder. The MallocMessageBuilder seems to create everything in memory first (then you call writeMessageToFd on it).

Cap'n'Proto API是否支持这种用例?

Does the Cap'n'Proto API support such a use-case?

或者是Cap'n'Proto更倾向于在发送之前用于适合内存的消息?

Or is Cap'n'Proto more meant to be used for messages that fit into memory before sending?

在这个例子中,Document结构可以省略,只是发送一个Header消息的序列,记录消息和一个Footer由于Cap'n'Proto消息是自分隔的,这应该工作,但是你松了你的文档根 - 或许有时这不是一个选项。 / p>

In this example, the Document struct could be omitted and then one could just send a sequence of one Header message, n Record messages and one Footer. Since a Cap'n'Proto message is self-delimiting, this should work. But you loose your document root - perhaps sometimes this is not really an option.

推荐答案

您概述的解决方案 - 将文档的各部分作为单独的消息发送 - 可能最适合您的用例。 ,Cap'n Proto不是设计用于单个消息的流块,因为它不适合与其随机访问属性(例如当你尝试跟踪一个指针指向你还没有收到的块时会发生什么?)。

The solution you outlined -- sending the parts of the document as separate messages -- is probably best for your use case. Fundamentally, Cap'n Proto is not designed for streaming chunks of a single message, since that would not fit well with its random-access properties (e.g. what happens when you try to follow a pointer that points to a chunk you haven't received yet?). Instead, when you want streaming, you should split a large message into a series of smaller messages.

也就是说,与其他类似的系统(如Protobuf)不同,Cap'n Proto不严格要求消息适合内存。具体来说,您可以使用 mmap(2 ) 。如果您的文档数据来自磁盘上的文件,您可以将 mmap()文件复制到内存中,然后将其合并到您的邮件中。使用 mmap(),操作系统实际上不会从磁盘读取数据,直到您尝试访问内存,操作系统也可以清除内存中的页面访问,因为它知道它仍然有一个磁盘上的副本。这通常允许你编写更简单的代码,因为你不再需要考虑内存管理。

That said, unlike other similar systems (e.g. Protobuf), Cap'n Proto does not strictly require messages to fit into memory. Specifically, you can do some tricks using mmap(2). If your document data is coming from a file on disk, you can mmap() the file into memory and then incorporate it into your message. With mmap(), the operating system does not actually read the data from disk until you attempt to access the memory, and the OS can also purge the pages from memory after they are accessed since it knows it still has a copy on disk. This often lets you write much simpler code, since you no longer need to think about memory management.

为了合并 mmap / code> ed chunk放入Cap'n Proto消息中,您将需要使用 capnp :: Orphanage :: referenceExternalData()。例如,给定:

In order to incorporate an mmap()ed chunk into a Cap'n Proto message, you'll want to use capnp::Orphanage::referenceExternalData(). For example, given:

struct MyDocument {
  body @0 :Data;
  # (other fields)
}

>

You might write:

// Map file into memory.
void* ptr = (kj::byte*)mmap(
    nullptr, size, PROT_READ, MAP_PRIVATE, fd, 0);
if (ptr == MAP_FAILED) {
  KJ_FAIL_SYSCALL("mmap", errno);
}
auto data = capnp::Data::Reader((kj::byte*)ptr, size);

// Incorporate it into a message.
capnp::MallocMessageBuilder message;
auto root = message.getRoot<MyDocument>();
root.adoptDocumentBody(
    message.getOrphanage().referenceExternalData(data));

因为Cap'n Proto是零拷贝,所以最终会写入 mmap()将内存直接导出到套接字,而无需访问它。然后由操作系统根据需要从磁盘读取内容并输出到套接字。

Because Cap'n Proto is zero-copy, it will end up writing the mmap()ed memory directly out to the socket without ever accessing it. It's then up to the OS to read the content from disk and out to the socket as appropriate.

当然,接收端仍然有问题。你会发现将接收端设计为读入 mmap()内存更加困难。一个策略可能是将整个流直接转储到一个文件(不涉及Cap'n Proto库),然后 mmap()该文件并使用 capnp :: FlatArrayMessageReader 即可读取 mmap()编辑的数据。

Of course, you still have a problem on the receiving end. You'll find it a lot more difficult to design the receiving end to read into mmap()ed memory. One strategy might be to dump the entire stream directly to a file first (without involving the Cap'n Proto library), then mmap() that file and use capnp::FlatArrayMessageReader to read the mmap()ed data in-place.

我描述所有这一切,因为它是一个整洁的事情是可能的Cap'n Proto但不是大多数其他序列化框架(例如,你不能这样与Protobuf)。使用 mmap()的技巧有时是非常有用的 - 我已经在沙尘暴,Cap'n Proto的父项目。但是,我怀疑对于你的用例,将文档分成一系列的消息可能更有意义。

I describe all this because it's a neat thing that is possible with Cap'n Proto but not most other serialization frameworks (e.g. you couldn't do this with Protobuf). Playing tricks with mmap() is sometimes really useful -- I've used this successfully in several places in Sandstorm, Cap'n Proto's parent project. However, I suspect that for your use case, splitting the document into a series of messages probably makes more sense.

这篇关于串流与Cap'n'Proto流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆