为什么非侵入式序列化添加5字节零前缀? [英] Why does an non-intrusive serialization add a 5 byte zero prefix?

查看:92
本文介绍了为什么非侵入式序列化添加5字节零前缀?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究使用boost :: archive的应用程序中从非标准端口到标准字符串的情况.非标准字符串具有以非介入式样式定义的(反)序列化,如下例所示.序列化和反序列化可以按预期工作,但是当移植的应用程序收到旧消息时,它会由于分配错误而崩溃.这是因为在字符串的大小之前插入了5个字节(全为零).

I am investigating a port from a non-standard to a standard string in an application that uses boost::archive. The non standard string has its (de-)serialization defined in the non-intrusive style as shown in the example below. Serialization and deserialization works as expected, but when the ported application receives an old message, it crashes with a bad allocation. This is caused by the insertion of 5 bytes (all zero) before the size of the string.

是什么原因导致这5个额外字节的插入?这是魔术标记吗?

What causes the insertion of these 5 extra bytes? Is this some kind of magic marker?

示例:

#include <iostream>
#include <string>
#include <sstream>
#include <boost/serialization/split_free.hpp>
#include <boost/archive/binary_oarchive.hpp>

struct own_string { // simplified custom string class
    std::string content;
};

namespace boost
{
    namespace serialization
    {
        template<class Archive>
        inline void save(
            Archive & ar,
            const own_string & t,
            const unsigned int /* file_version */)
        {
            size_t size = t.content.size();
            ar << size;
            ar.save_binary(&t.content[0], size);
        }

        template<class Archive>
        inline void load(
            Archive & ar,
            own_string & t,
            const unsigned int /* file_version */)
        {
            size_t size;
            ar >> size;
            t.content.resize(size);
            ar.load_binary(&t.content[0], size);
        }

// split non-intrusive serialization function member into separate
// non intrusive save/load member functions
        template<class Archive>
        inline void serialize(
            Archive & ar,
            own_string & t,
            const unsigned int file_version)
        {
            boost::serialization::split_free(ar, t, file_version);
        }

    } // namespace serialization
} // namespace boost

std::string string_to_hex(const std::string& input)
{
    static const char* const lut = "0123456789ABCDEF";
    size_t len = input.length();

    std::string output;
    output.reserve(2 * len);
    for (size_t i = 0; i < len; ++i)
    {
        const unsigned char c = input[i];
        output.push_back(lut[c >> 4]);
        output.push_back(lut[c & 15]);
    }
    return output;
}

void test_normal_string()
{
    std::stringstream ss;
    boost::archive::binary_oarchive ar{ss};

    std::string test = "";

    std::cout << string_to_hex(ss.str()) << std::endl;
    ar << test;

    //adds 00 00 00 00 00 00 00 00
    std::cout << string_to_hex(ss.str()) << std::endl;
}

void test_own_string()
{
    std::stringstream ss;
    boost::archive::binary_oarchive ar{ss};

    std::string test = "";

    own_string otest{test};
    std::cout << string_to_hex(ss.str()) << std::endl;
    ar << otest;

    //adds 00 00 00 00 00 00 00 00 00 00 00 00 00
    std::cout << string_to_hex(ss.str()) << std::endl;
}

int main()
{
    test_normal_string();
    test_own_string();
}

推荐答案

因此,您想反序列化先前已序列化的own_string,就好像它是std::string.

So, you'd want to deserialize a previously serialized own_string as if it were a std::string.

来自 boost(1.65.1) doc :

默认情况下,对于每个序列化的类,将类信息写入存档.此信息包括版本号,实现级别和跟踪行为.这是必需的,以便即使程序的后续版本更改了某个类的某些当前特征值,也可以正确地反序列化归档文件.此数据的空间开销很小.由于需要检查每个类以查看其类信息是否已包含在归档中,因此存在一点运行时开销.在某些情况下,甚至认为这也太多了.通过将实现级别的类特征设置为boost :: serialization :: object_serializable,可以消除这些额外的开销.

By default, for each class serialized, class information is written to the archive. This information includes version number, implementation level and tracking behavior. This is necessary so that the archive can be correctly deserialized even if a subsequent version of the program changes some of the current trait values for a class. The space overhead for this data is minimal. There is a little bit of runtime overhead since each class has to be checked to see if it has already had its class information included in the archive. In some cases, even this might be considered too much. This extra overhead can be eliminated by setting the implementation level class trait to: boost::serialization::object_serializable.

现在,可能(*)这是标准类的默认设置.实际上,添加

Now, probably(*) this is the default for standard classes. In fact, adding

BOOST_CLASS_IMPLEMENTATION(own_string, boost::serialization::object_serializable)

全局范围内的

使test_X_string结果的字节相同.这应该可以解释观察到的额外字节差异.

at global scope makes test_X_string results in the same bytes. This should explain the observed extra bytes difference.

也就是说,我没有找到有关标准类序列化特征的任何具体保证(其他人可能比我更了解).

(*)实际上是关于以下内容的部分特征设置的可移植性提到:

(*) actually the section about portability of traits settings mentions that:

避免此问题的另一种方法是将序列化特征分配给所有原始类型的模板my_wrapper的所有特殊化,这样就永远不会保存类信息. 这是我们为STL集合实施序列化所要做的

因此,这可能会让您有足够的信心,在这种情况下,标准集合(因此包括std :: string)将给出相同的字节.

so this may give you enough confidence that standard collections (hence including std::string) will give the same bytes in this case.

这篇关于为什么非侵入式序列化添加5字节零前缀?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆