将字节数组转换为POD [英] Cast array of bytes to POD

查看：56 发布时间：2021/4/19 20:52:02 c++ strict-aliasing

本文介绍了将字节数组转换为POD的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

比方说，我有一个无符号字符数组，它们代表一堆POD对象(例如，从套接字读取或通过mmap读取).它们代表哪种类型以及在运行时确定在什么位置，但是我们假设每种类型已经正确对齐.

Let's say, I have an array of unsigned chars that represents a bunch of POD objects (e.g. either read from a socket or via mmap). Which types they represent and at what position is determined at runtime, but we assume, that each is already properly aligned.

将这些字节投射"到相应的POD类型的最佳方法是什么?

一个解决方案应该符合c ++标准(比方说> = c ++ 11)，或者至少保证可以与g ++> = 4.9，clang ++> = 3.5和MSVC> = 2015U3一起使用.在Linux，Windows上，在x86/x64或32/64位arm上运行.

A solution should either be compliant to the c++ standard (let's say >= c++11) or at least be guaranteed to work with g++ >= 4.9, clang++ >= 3.5 and MSVC >= 2015U3. On linux, windows, running on x86/x64 or 32/64-Bit arm.

理想情况下，我想执行以下操作:

Ideally I'd like to do something like this:

uint8_t buffer[100]; //filled e.g. from network

switch(buffer[0]) {
    case 0: process(*reinterpret_cast<Pod1*>(&buffer[4]); break;
    case 1: process(*reinterpret_cast<Pod2*>(&buffer[8+buffer[1]*4]); break;
    //...
}

或

switch(buffer[0]) {
    case 0: {
         auto* ptr = new(&buffer[4]) Pod1; 
         process(*ptr); 
    }break;
    case 1: {
         auto* ptr = new(&buffer[8+buffer[1]*4]) Pod2; 
         process(*ptr); 
    }break;
    //...
}

两者似乎都可以，但是两者都是c ++ ¹⁾中AFAIK的未定义行为.只是为了完整性:我知道将这些内容复制到适当的局部变量中的通常"解决方案:

Both seem to work, but both are AFAIK undefined behavior in c++¹⁾. And just for completeness: I'm aware of the "usual" solution to just copy the stuff into an appropriate local variable:

 Pod1 tmp;
 std::copy_n(&buffer[4],sizeof(tmp), reinterpret_cast<uint8_t*>(&tmp));             
 process(tmp);

在某些情况下，它可能没有开销，而在某些情况下，它甚至可能更快，但除了性能，我再也无法做到这一点.修改数据并说实话:知道我在内存中的适当位置有正确的位，但我只是不能使用它们，这让我很烦.

In some situations it might be no overhead in others it is and in some situations it might even be faster but performance aside, I no longer can e.g. modify the data in place and to be honest: it just annoys me to know that I have the right bits at an appropriate location in memory but I just can't use them.

我想到的一个疯狂的解决方案是:

A somewhat crazy solution I came up with is this:

template<class T>
T* inplace_cast(uint8_t* data) {
    //checks omitted for brevity
    T tmp;
    std::memmove((uint8_t*)&tmp, data, sizeof(tmp));
    auto ptr = new(data) T;
    std::memmove(ptr, (uint8_t*)&tmp,  sizeof(tmp));
    return ptr;

}

g ++和clang ++似乎能够优化掉这些副本，但是我认为这给优化器带来了很多负担，并可能导致其他优化失败，不适用于 const uint8_t * (尽管我不想真正对其进行修改)并且看起来很可怕(不要以为您会得到过去的代码审查).

g++ and clang++ seem to be able to optimize away those copies but I think this puts a lot of burden on the optimizer and might cause other optimizations to fail, doesn't work with const uint8_t* (although I don't want to actually modify it) and just looks horrible (don't think you would get that past code review).

¹⁾第一个是UB，因为它破坏了严格的别名，第二个可能是UB(

¹⁾ The first one is UB because it breaks strict aliasing, the second one is probably UB (discussed here) because the standard just says that the resulting object is not initialized and has indeterminate value (instead of guaranteeing that the underlying memory is untouched). I believe the first one's equivalent c-code is well defined, so compilers might allow this for compatibility with c-headers, but I'm unsure of this.

推荐答案

最正确的方法是创建所需POD类的(临时)变量，并使用 memcpy()复制数据从缓冲区放入该变量:

The most correct way is to create a (temporary) variable of the desired POD class, and to use memcpy() to copy data from the buffer into that variable:

switch(buffer[0]) {
    case 0: {
        Pod1 var;
        std::memcpy(&var, &buffer[4], sizeof var);
        process(var);
        break;
    }
    case 1: {
        Pod2 var;
        std::memcpy(&var, &buffer[8 + buffer[1] * 4], sizeof var);
        process(var);
        break;
    }
    //...
}

执行此操作的主要原因是由于对齐问题:缓冲区中的数据可能未针对您使用的POD类型正确对齐.进行复印可以消除此问题.即使网络缓冲区不再可用，它也允许您继续使用该变量.

There main reason for doing this is because of alignment issues: the data in the buffer may not be aligned correctly for the POD type you are using. Making a copy eliminates this problem. It also allows you to keep using the variable even if the network buffer is no longer available.

只有当您完全确定数据正确对齐后，才能使用您给出的第一个解决方案.

Only if you are absolutely sure that the data is properly aligned can you use the first solution you gave.

(如果您要从网络中读取数据，则应始终先检查数据是否有效，并且不要在缓冲区外读取数据.例如，使用& buffer [8 + buffer[1] * 4] ，应检查该地址的开头加上Pod2的大小不超过缓冲区长度，幸运的是，您使用的是 uint8_t ，否则，必须检查 buffer [1] 是否为负.)

(If you are reading in data from the network, you should always check that the data is valid first, and that you won't read outside of your buffer. For example with &buffer[8 + buffer[1] * 4], you should check that the start of that address plus the size of Pod2 does not exceed the buffer length. Luckily you are using uint8_t, otherwise you'd also have to check that buffer[1] is not negative.)

这篇关于将字节数组转换为POD的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将字节数组转换为POD [英] Cast array of bytes to POD

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

将字节数组转换为POD [英] Cast array of bytes to POD

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭