reinterpret_cast,char *和未定义的行为 [英] reinterpret_cast, char*, and undefined behavior

查看:268
本文介绍了reinterpret_cast,char *和未定义的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

reinterpret_castchar*(或char[N])是未定义行为的情况是什么?何时定义行为?我应该用什么经验法则来回答这个问题?

What are the cases where reinterpret_casting a char* (or char[N]) is undefined behavior, and when is it defined behavior? What is the rule of thumb I should be using to answer this question?

我们从此问题中学到了以下内容,这是未定义的行为:

As we learned from this question, the following is undefined behavior:

alignas(int) char data[sizeof(int)];
int *myInt = new (data) int;           // OK
*myInt = 34;                           // OK
int i = *reinterpret_cast<int*>(data); // <== UB! have to use std::launder

但是什么时候我们可以在char数组上执行reinterpret_cast,并且它不是未定义的行为?以下是一些简单的示例:

But at what point can we do a reinterpret_cast on a char array and have it NOT be undefined behavior? Here are a few simple examples:

  1. new,仅reinterpret_cast:

alignas(int) char data[sizeof(int)];
*reinterpret_cast<int*>(data) = 42;    // is the first cast write UB?
int i = *reinterpret_cast<int*>(data); // how about a read?
*reinterpret_cast<int*>(data) = 4;     // how about the second write?
int j = *reinterpret_cast<int*>(data); // or the second read?

int的生存期何时开始? data的声明吗?如果是这样,data的生命周期何时结束?

When does the lifetime for the int start? Is it with the declaration of data? If so, when does the lifetime of data end?

如果data是指针怎么办?

char* data_ptr = new char[sizeof(int)];
*reinterpret_cast<int*>(data_ptr) = 4;     // is this UB?
int i = *reinterpret_cast<int*>(data_ptr); // how about the read?

  • 如果我只是在网上接收结构,并想根据第一个字节有条件地强制转换它们,该怎么办?

  • What if I'm just receiving structs on the wire and want to conditionally cast them based on what the first byte is?

    // bunch of handle functions that do stuff with the members of these types
    void handle(MsgType1 const& );
    void handle(MsgTypeF const& );
    
    char buffer[100]; 
    ::recv(some_socket, buffer, 100)
    
    switch (buffer[0]) {
    case '1':
        handle(*reinterpret_cast<MsgType1*>(buffer)); // is this UB?
        break;
    case 'F':
        handle(*reinterpret_cast<MsgTypeF*>(buffer));
        break;
    // ...
    }
    

  • 这些案例中有UB吗?都是吗这个问题的答案在C ++ 11到C ++ 1z之间会改变吗?

    Are any of these cases UB? Are all of them? Does the answer to this question change between C++11 to C++1z?

    推荐答案

    此处有两个规则在起作用:

    There are two rules at play here:

    1. [basic.lval]/8,又是严格的别名规则:简单地说,您无法通过指针或对错误类型的引用来访问对象.

    1. [basic.lval]/8, aka, the strict aliasing rule: simply put, you can't access an object through a pointer/reference to the wrong type.

    [base.life]/8:简而言之,如果您为不同类型的对象重复使用存储,则不能使用指向旧对象的指针而不先清洗它们.

    [base.life]/8: simply put, if you reuse storage for an object of a different type, you can't use pointers to the old object(s) without laundering them first.

    这些规则是区分存储位置"或存储区域"与对象"的重要部分.

    These rules are an important part of making a distinction between "a memory location" or "a region of storage" and "an object".

    您所有的代码示例都会遇到相同的问题:它们不是您将其强制转换为的对象:

    All of your code examples fall prey to the same problem: they're not the object you cast them to:

    alignas(int) char data[sizeof(int)];
    

    这将创建一个类型为char[sizeof(int)]的对象.该对象不是 .因此,您可能无法像访问它一样访问它.不管是读还是写,都没有关系.您仍然会招惹UB.

    That creates an object of type char[sizeof(int)]. That object is not an int. Therefore, you may not access it as if it were. It doesn't matter if it is a read or a write; you still provoke UB.

    类似地:

    char* data_ptr = new char[sizeof(int)];
    

    这还会创建一个char[sizeof(int)]类型的对象.

    That also creates an object of type char[sizeof(int)].

    char buffer[100];
    

    这将创建一个类型为char[100]的对象.该对象既不是MsgType1也不是MsgTypeF.因此,您无法像访问任何一个一样访问它.

    This creates an object of type char[100]. That object is neither a MsgType1 nor a MsgTypeF. So you cannot access it as if it were either.

    请注意,此处的UB是作为Msg*类型之一访问缓冲区时,而不是在检查第一个字节时.如果您所有的Msg*类型都是微不足道的可复制的,那么完全可以接受的是读取第一个字节,然后将缓冲区复制到适当类型的对象中.

    Note that the UB here is when you access the buffer as one of the Msg* types, not when you check the first byte. If all your Msg* types are trivially copyable, it's perfectly acceptable to read the first byte, then copy the buffer into an object of the appropriate type.

    switch (buffer[0]) {
    case '1':
        {
            MsgType1 msg;
            memcpy(&msg, buffer, sizeof(MsgType1);
            handle(msg);
        }
        break;
    case 'F':
        {
            MsgTypeF msg;
            memcpy(&msg, buffer, sizeof(MsgTypeF);
            handle(msg);
        }
        break;
    // ...
    }
    

    请注意,我们正在谈论的是什么语言状态将是未定义的行为.编译器在其中任何一个方面都很好的可能性很好.

    Note that we're talking about what the language states will be undefined behavior. Odds are good that the compiler would be just fine with any of these.

    这个问题的答案在C ++ 11到C ++ 1z之间变化吗?

    Does the answer to this question change between C++11 to C++1z?

    自C ++ 11起,出现了一些重要的规则说明(尤其是[basic.life]).但是规则的意图并没有改变.

    There have been some significant rule clarifications since C++11 (particularly [basic.life]). But the intent behind the rules hasn't changed.

    这篇关于reinterpret_cast,char *和未定义的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆