reinterpret_cast,char *和未定义的行为 [英] reinterpret_cast, char*, and undefined behavior
问题描述
在reinterpret_cast
中char*
(或char[N]
)是未定义行为的情况是什么?何时定义行为?我应该用什么经验法则来回答这个问题?
What are the cases where reinterpret_cast
ing a char*
(or char[N]
) is undefined behavior, and when is it defined behavior? What is the rule of thumb I should be using to answer this question?
我们从此问题中学到了以下内容,这是未定义的行为:
As we learned from this question, the following is undefined behavior:
alignas(int) char data[sizeof(int)];
int *myInt = new (data) int; // OK
*myInt = 34; // OK
int i = *reinterpret_cast<int*>(data); // <== UB! have to use std::launder
但是什么时候我们可以在char
数组上执行reinterpret_cast
,并且它不是未定义的行为?以下是一些简单的示例:
But at what point can we do a reinterpret_cast
on a char
array and have it NOT be undefined behavior? Here are a few simple examples:
-
否
new
,仅reinterpret_cast
:
alignas(int) char data[sizeof(int)];
*reinterpret_cast<int*>(data) = 42; // is the first cast write UB?
int i = *reinterpret_cast<int*>(data); // how about a read?
*reinterpret_cast<int*>(data) = 4; // how about the second write?
int j = *reinterpret_cast<int*>(data); // or the second read?
int
的生存期何时开始? data
的声明吗?如果是这样,data
的生命周期何时结束?
When does the lifetime for the int
start? Is it with the declaration of data
? If so, when does the lifetime of data
end?
如果data
是指针怎么办?
char* data_ptr = new char[sizeof(int)];
*reinterpret_cast<int*>(data_ptr) = 4; // is this UB?
int i = *reinterpret_cast<int*>(data_ptr); // how about the read?
如果我只是在网上接收结构,并想根据第一个字节有条件地强制转换它们,该怎么办?
What if I'm just receiving structs on the wire and want to conditionally cast them based on what the first byte is?
// bunch of handle functions that do stuff with the members of these types
void handle(MsgType1 const& );
void handle(MsgTypeF const& );
char buffer[100];
::recv(some_socket, buffer, 100)
switch (buffer[0]) {
case '1':
handle(*reinterpret_cast<MsgType1*>(buffer)); // is this UB?
break;
case 'F':
handle(*reinterpret_cast<MsgTypeF*>(buffer));
break;
// ...
}
这些案例中有UB吗?都是吗这个问题的答案在C ++ 11到C ++ 1z之间会改变吗?
Are any of these cases UB? Are all of them? Does the answer to this question change between C++11 to C++1z?
推荐答案
此处有两个规则在起作用:
There are two rules at play here:
-
[basic.lval]/8,又是严格的别名规则:简单地说,您无法通过指针或对错误类型的引用来访问对象.
[basic.lval]/8, aka, the strict aliasing rule: simply put, you can't access an object through a pointer/reference to the wrong type.
[base.life]/8:简而言之,如果您为不同类型的对象重复使用存储,则不能使用指向旧对象的指针而不先清洗它们.
[base.life]/8: simply put, if you reuse storage for an object of a different type, you can't use pointers to the old object(s) without laundering them first.
这些规则是区分存储位置"或存储区域"与对象"的重要部分.
These rules are an important part of making a distinction between "a memory location" or "a region of storage" and "an object".
您所有的代码示例都会遇到相同的问题:它们不是您将其强制转换为的对象:
All of your code examples fall prey to the same problem: they're not the object you cast them to:
alignas(int) char data[sizeof(int)];
这将创建一个类型为char[sizeof(int)]
的对象.该对象不是
That creates an object of type char[sizeof(int)]
. That object is not an int
. Therefore, you may not access it as if it were. It doesn't matter if it is a read or a write; you still provoke UB.
类似地:
char* data_ptr = new char[sizeof(int)];
这还会创建一个char[sizeof(int)]
类型的对象.
That also creates an object of type char[sizeof(int)]
.
char buffer[100];
这将创建一个类型为char[100]
的对象.该对象既不是MsgType1
也不是MsgTypeF
.因此,您无法像访问任何一个一样访问它.
This creates an object of type char[100]
. That object is neither a MsgType1
nor a MsgTypeF
. So you cannot access it as if it were either.
请注意,此处的UB是作为Msg*
类型之一访问缓冲区时,而不是在检查第一个字节时.如果您所有的Msg*
类型都是微不足道的可复制的,那么完全可以接受的是读取第一个字节,然后将缓冲区复制到适当类型的对象中.
Note that the UB here is when you access the buffer as one of the Msg*
types, not when you check the first byte. If all your Msg*
types are trivially copyable, it's perfectly acceptable to read the first byte, then copy the buffer into an object of the appropriate type.
switch (buffer[0]) {
case '1':
{
MsgType1 msg;
memcpy(&msg, buffer, sizeof(MsgType1);
handle(msg);
}
break;
case 'F':
{
MsgTypeF msg;
memcpy(&msg, buffer, sizeof(MsgTypeF);
handle(msg);
}
break;
// ...
}
请注意,我们正在谈论的是什么语言状态将是未定义的行为.编译器在其中任何一个方面都很好的可能性很好.
Note that we're talking about what the language states will be undefined behavior. Odds are good that the compiler would be just fine with any of these.
这个问题的答案在C ++ 11到C ++ 1z之间变化吗?
Does the answer to this question change between C++11 to C++1z?
自C ++ 11起,出现了一些重要的规则说明(尤其是[basic.life]).但是规则的意图并没有改变.
There have been some significant rule clarifications since C++11 (particularly [basic.life]). But the intent behind the rules hasn't changed.
这篇关于reinterpret_cast,char *和未定义的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!