如何正确地将字节数组反序列化为C ++中的对象? [英] How does one properly deserialize a byte array back into an object in C++?

查看:87
本文介绍了如何正确地将字节数组反序列化为C ++中的对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的团队已经遇到这个问题了几个星期了,我们有些困惑。

My team has been having this issue for a few weeks now, and we're a bit stumped. Kindness and knowledge would be gracefully received!

与嵌入式系统一起使用,我们试图序列化一个对象,通过Linux套接字发送它,在另一个过程中接收它,并将其反序列化回原始对象。我们具有以下反序列化功能:

Working with an embedded system, we are attempting to serialize an object, send it through a Linux socket, receive it in another process, and deserialize it back into the original object. We have the following deserialization function:

 /*! Takes a byte array and populates the object's data members */
std::shared_ptr<Foo> Foo::unmarshal(uint8_t *serialized, uint32_t size)
{
  auto msg = reinterpret_cast<Foo *>(serialized);
  return std::shared_ptr<ChildOfFoo>(
        reinterpret_cast<ChildOfFoo *>(serialized));
}

对象已成功反序列化并可以读取。但是,当调用返回的 std :: shared_ptr< Foo> 的析构函数时,程序将出现段错误。 Valgrind提供以下输出:

The object is successfully deserialzed and can be read from. However, when the destructor for the returned std::shared_ptr<Foo> is called, the program segfaults. Valgrind gives the following output:

==1664== Process terminating with default action of signal 11 (SIGSEGV)
==1664==  Bad permissions for mapped region at address 0xFFFF603800003C88
==1664==    at 0xFFFF603800003C88: ???
==1664==    by 0x42C7C3: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:149)
==1664==    by 0x42BC00: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:666)
==1664==    by 0x435999: std::__shared_ptr<ChildOfFoo, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:914)
==1664==    by 0x4359B3: std::shared_ptr<ChildOfFoo>::~shared_ptr() (shared_ptr.h:93)

我们愿意接受任何建议!谢谢您的时间:)

We're open to any suggestions at all! Thank you for your time :)

推荐答案

通常,这不起作用:

auto msg = reinterpret_cast<Foo *>(serialized);

您不能只是获取任意字节的数组并假装它是有效的C ++对象(即使reinterpret_cast<允许您编译尝试这样做的代码)。一方面,任何包含至少一个虚拟方法的C ++对象都将包含一个vtable指针,该指针指向该对象的类的虚拟方法表,并在调用虚拟方法时使用。但是,如果在计算机A上序列化该指针,然后通过网络发送该指针并反序列化,然后尝试在计算机B上使用重构的对象,则会调用未定义的行为,因为无法保证该类的vtable将同时存在此外,任何进行任何类型的动态内存分配的类(例如,任何字符串类或容器类)都将包含指向它分配的其他对象的指针,这将导致您进入相同的无效指针问题。

You can't just take an arbitrary array of bytes and pretend it's a valid C++ object (even if reinterpret_cast<> allows you to compile code that attempts to do so). For one thing, any C++ object that contains at least one virtual method will contain a vtable pointer, which points to the virtual-methods table for that object's class, and is used whenever a virtual method is called. But if you serialize that pointer on computer A, then send it across the network and deserialize and then try to use the reconstituted object on computer B, you'll invoke undefined behavior because there is no guarantee that that class's vtable will exist at the same memory location on computer B that it did on computer A. Also, any class that does any kind of dynamic memory allocation (e.g. any string class or container class) will contain pointers to other objects that it allocated, and that will lead you to the same invalid-pointer problem.

但是,假设您已将序列化限制为仅 POD (普通旧数据)对象,不包含任何指针。那会行吗?答案是:在非常特定的情况下,可能会很脆弱。这样做的原因是,编译器可以自由地以不同方式在内存中布置类的成员变量,并且它将在不同的硬件上(甚至有时使用不同的优化设置)以不同的方式插入填充,从而导致字节代表计算机A上特定Foo对象的字节与代表计算机B上相同对象的字节不同。最重要的是,您可能不得不担心不同计算机上的字长不同(例如,32位长某些架构和其他架构上的64位),以及不同的字节序(例如,Intel CPU以低字节序形式表示值,而PowerPC CPU通常以高字节序表示它们)。这些差异中的任何一个都会导致您的接收计算机对接收到的字节进行错误的解释,从而严重破坏您的数据。

But let's say you've limited your serializations to only POD (plain old Data) objects that contain no pointers. Will it work then? The answer is: possibly, in very specific cases, but it will be very fragile. The reason for that is that the compiler is free to lay out the class's member variables in memory in different ways, and it will insert padding differently on different hardware (or even with different optimization settings, sometimes), leading to a situation where the bytes that represent a particular Foo object on computer A are different from the bytes that would represent that same object on computer B. On top of that you may have to to worry about different word-lengths on different computers (e.g. long is 32-bit on some architectures and 64-bit on others), and different endian-ness (e.g. Intel CPUs represent values in little-endian form while PowerPC CPUs typically represent them in big-endian). Any one of these differences will cause your receiving computer to misinterpret the bytes it received and thereby corrupt your data badly.

因此,问题的其余部分是,序列化的正确方法是什么? /反序列化C ++对象?答案是:您必须采取艰辛的方式,为每个类编写一个例程,以对每个成员变量进行序列化,并考虑到该类的特殊语义,对该成员变量进行序列化。例如,以下是一些您可能需要可序列化的类定义的方法:

So the remaining part of the question is, what is the proper way to serialize/deserialize a C++ object? And the answer is: you have to do it the hard way, by writing a routine for each class that does the serialization member-variable by member-variable, taking the class's particular semantics into account. For example, here are some methods that you might have your serializable classes define:

// Serialize this object's state out into (buffer)
// (buffer) must point to at least FlattenedSize() bytes of writeable space
void Flatten(uint8_t *buffer) const;

// Return the number of bytes this object will require to serialize
size_t FlattenedSize() const;

// Set this object's state from the bytes in (buffer)
// Returns true on success, or false on failure
bool Unflatten(const uint8_t *buffer, size_t size);

...这是实现方法的简单x / y点类的示例:

... and here's an example of a simple x/y point class that implements the methods:

class Point
{
public:
    Point() : m_x(0), m_y(0) {/* empty */}
    Point(int32_t x, int32_t y) : m_x(x), m_y(y) {/* empty */}

    void Flatten(uint8_t *buffer) const
    {
       const int32_t beX = htonl(m_x);
       memcpy(buffer, &beX, sizeof(beX));
       buffer += sizeof(beX);
       
       const int32_t beY = htonl(m_y);
       memcpy(buffer, &beY, sizeof(beY));
    }

    size_t FlattenedSize() const {return sizeof(m_x) + sizeof(m_y);}

    bool Unflatten(const uint8_t *buffer, size_t size)
    {
       if (size < FlattenedSize()) return false;

       int32_t beX;
       memcpy(&beX, buffer, sizeof(beX);
       m_x = ntohl(beX);

       buffer += sizeof(beX);
       int32_t beY;
       memcpy(&beY, buffer, sizeof(beY));
       m_y = ntohl(beY);

       return true;
    }

    int32_t m_x;
    int32_t m_y;
 };

... unmarshal函数可能如下所示(请注意,我已将其模板化,以便它可用于实现上述方法的任何类):

... then your unmarshal function could look like this (note I've made it templated so that it will work for any class that implements the above methods):

/*! Takes a byte array and populates the object's data members */
template<class T> std::shared_ptr<T> unmarshal(const uint8_t *serialized, size_t size)
{
    auto sp = std::make_shared<T>();
    if (sp->Unflatten(serialized, size) == true) return sp;
 
    // Oops, Unflatten() failed!  handle the error somehow here
    [...]
}

如果这看起来像相比之下,很多工作抓住类对象的原始内存字节并将其逐字发送给网络,这是正确的。但是,如果您希望序列化可靠地工作并且在每次升级编译器,更改优化标志或在具有不同CPU体系结构的计算机之间进行通信时都不会中断,则必须执行此操作。如果您不想手动执行此类操作,则可以使用预打包的库来帮助(部分)自动化流程,例如 Google的协议缓冲区库,甚至是很好的旧XML。

If this seems like a lot of work compared to just grabbing the raw memory bytes of your class object and sending them verbatim across the wire, you're right -- it is. But this is what you have to do if you want the serialization to work reliably and not break every time you upgrade your compiler, or change your optimization flags, or want to communicate between computers with different CPU architectures. If you'd rather not do this sort of thing by hand, there are pre-packaged libraries to assist by with (partially) automating the process, such as Google's Protocol Buffers library, or even good old XML.

这篇关于如何正确地将字节数组反序列化为C ++中的对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆