C ++的严格别名规则-“ char”别名豁免是2条街道吗? [英] C++'s Strict Aliasing Rule - Is the 'char' aliasing exemption a 2-way street?

查看:72
本文介绍了C ++的严格别名规则-“ char”别名豁免是2条街道吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

就在几周前,我了解到C ++标准具有严格的别名规则。基本上,我问过一个有关移位位的问题,而不是一次将每个字节移位一次,为了最大化性能,我想用(分别为32位或64位)加载处理器的本机寄存器并执行4/8移位字节全部在一条指令中。

Just a couple weeks ago, I learned that the C++ Standard had a strict aliasing rule. Basically, I had asked a question about shifting bits -- rather than shifting each byte one at a time, to maximize performance I wanted to load my processor's native register's with (32 or 64 bits, respectively) and perform the shift of 4/8 bytes all in a single instruction.

这是我要避免的代码:

unsigned char buffer[] = { 0xab, 0xcd, 0xef, 0x46 };

for (int i = 0; i < 3; ++i)
{
  buffer[i] <<= 4; 
  buffer[i] |= (buffer[i + 1] >> 4);
}
buffer[3] <<= 4;

相反,我想使用类似的东西:

And instead, I wanted to use something like:

unsigned char buffer[] = { 0xab, 0xcd, 0xef, 0x46 };
unsigned int *p = (unsigned int*)buffer; // unsigned int is 32 bit on my platform
*p <<= 4;

有人在评论中说我提出的解决方案违反了C ++别名规则(因为p是类型 int * 和buffer的类型为 char * ,而我在引用p进行移位。(请忽略可能的问题对齐和字节顺序-我处理了此代码段之外的内容)得知他严格的别名规则令我感到非常惊讶,因为我经常对缓冲区中的数据进行操作,将其从一种类型转换为另一种类型,并且从来没有任何问题。调查显示,我使用的编译器(MSVC)并未执行严格的别名规则,并且由于我业余时间仅在gcc / g ++上进行开发,因此我可能还没有遇到这个问题。

Someone called out in a comment that my proposed solution violated the C++ Aliasing rules (because p was of type int* and buffer was of type char* and I was dereferencing p to perform the shift. (Please ignore possible issues of alignment and byte order -- I handle those outside of this snippet) I was quite surprised to learn about he Strict Aliasing rule since I regularly operate on data from buffers, casting it from one type to another and have never had any issue. Further investigation revealed that the compiler I use (MSVC) doesn't enforce strict aliasing rules and since I only develop on gcc/g++ in my spare time as a hobby, I likely just hadn't encountered the issue yet.

因此,我问了一个有关严格混叠规则和C ++的Placement new运算符的问题:

So then I asked a question about Strict Aliasing Rules and C++'s Placement new operator:

IsoCpp.org提供了有关新的Placement的FAQ。并提供以下代码示例:

IsoCpp.org offers a FAQ regarding placement new and they provide the following code example:

#include <new>        // Must #include this to use "placement new"
#include "Fred.h"     // Declaration of class Fred
void someCode()
{
  char memory[sizeof(Fred)];     // Line #1
  void* place = memory;          // Line #2
  Fred* f = new(place) Fred();   // Line #3 (see "DANGER" below)
  // The pointers f and place will be equal
  // ...
}

示例很简单,但我问自己:如果有人在 f -例如 f-> talk()?那时我们将取消引用 f ,它指向与内存(类型为 char * )相同的内存位置。我读过很多地方 char * 类型的变量可以免除任何类型的别名,但我的印象是它不是双向道路,即 char * 可以别名(读/写)任何类型的 T ,但类型为 T T 本身为 char时,$ c>才能用作 char * 的别名。 * 。在我输入时,这对我没有任何意义,因此我倾向于相信我的姓名首字母(移位示例)违反了严格的别名规则的说法。是错误的。

The example is simple enough, but I'm asking myself, "What if someone calls a method on f -- e.g. f->talk()? At that point we would be dereferencing f, which points to the same memory location as memory (of type char*. I've read numerous places that there is an exemption for variables of type char* to alias any type, but I was under the impression that it wasn't a "two-way street" -- meaning, char* can alias (read/write) any type T, but type T can only be used to alias a char* if T itself is of char*. As I'm typing this, that doesn't make any sense to me and so I'm leaning towards the belief that the claim that my initial (bit shifting example) violated the strict aliasing rule is false.

有人可以解释什么是正确的吗?我一直在努力去理解什么是合法的,什么是不合法的(尽管已经阅读了很多有关该主题的网站和文章)

Can someone please explain what is correct? I've been going nuts with trying to understand what is legal and what is not (despite having read numerous websites and SO posts on the topic)

谢谢

推荐答案

别名规则意味着,如果满足以下条件,则该语言仅保证您的指针取消引用有效(即不会触发未定义的行为):

The aliasing rule means that the language only promises your pointer dereferences to be valid (i.e. not trigger undefined behaviour) if:


  • 您可以通过兼容类的指针访问对象:它的实际类或其超类之一,可以正确地进行强制转换。这意味着,如果B是D的超类,并且您有 D * d 指向有效D,则访问 static_cast< B *返回的指针>(d)可以,但是访问 reinterpret_cast< B *>(d)返回的访问权限不是。后者可能无法说明D内B子对象的布局。

  • 您可以通过指向的指针进行访问字符。由于char是字节大小且按字节对齐的,因此无法从 char * 读取数据,而又无法从<$读取数据c $ c> D *

  • You access an object through a pointer of a compatible class: either its actual class or one of its superclasses, properly cast. This means that if B is a superclass of D and you have D* d pointing to a valid D, accessing the pointer returned by static_cast<B*>(d) is OK, but accessing that returned by reinterpret_cast<B*>(d) is not. The latter may have failed to account for the layout of the B sub-object inside D.
  • You access it through a pointer to char. Since char is byte-sized and byte-aligned, there is no way you could not be able to read data from a char* while being able to read it from a D*.

也就是说, other 规则该标准(尤其是有关数组布局和POD类型的标准)可以理解为确保您可以使用指针和 reinterpret_cast< T *> 作为别名 two-如果要确保具有适当大小和对齐方式的char数组,则可以在POD类型和 char 数组之间进行选择。

That said, other rules in the standard (in particular those about array layout and POD types) can be read as ensuring that you can use pointers and reinterpret_cast<T*> to alias two-way between POD types and char arrays if you make sure to have a char array of the apropriate size and alignment.

换句话说,这是合法的:

In other words, this is legal:

int* ia = new int[3];
char* pc = reinterpret_cast<char*>(ia);
// Possibly in some other function
int* pi = reinterpret_cast<int*>(pc);

可能会调用未定义的行为:

While this may invoke undefined behaviour:

char* some_buffer; size_t offset; // Possibly passed in as an argument
int* pi = reinterpret_cast<int*>(some_buffer + offset);
pi[2] = -5;

即使我们可以确保缓冲区足够大以包含三个 int s,对齐方式可能不正确。与所有未定义行为的实例一样,编译器绝对可以执行任何操作。三种常见的情况可能是:

Even if we can ensure that the buffer is big enough to contain three ints, the alignment might not be right. As with all instances of undefined behaviour, the compiler may do absolutely anything. Three common ocurrences could be:


  • 代码可能正当工作(TM),因为在您的平台中,所有内存分配的默认对齐方式都相同

  • 指针转换可能会将地址四舍五入为int的对齐方式(类似于pi = pc& -4),从而可能使您对错误的内存。

  • 指针取消引用本身可能会以某种方式失败:CPU可能会拒绝未对齐的访问,从而使应用程序崩溃。

  • The code might Just Work (TM) because in your platform the default alignment of all memory allocations is the same as that of int.
  • The pointer cast might round the address to the alignment of int (something like pi = pc & -4), potentially making you read/write to the wrong memory.
  • The pointer dereference itself may fail in some way: the CPU could reject misaligned accesses, making your application crash.

由于您总是想像魔鬼一样抵挡UB,因此需要一个大小正确的 char 数组,对准。最简单的方法是从正确类型的数组(在本例中为int)开始,然后通过char指针填充它,这是允许的,因为int是POD类型。

Since you always want to ward off UB like the devil itself, you need a char array with the correct size and alignment. The easiest way to get that is simply to start with an array of the "right" type (int in this case), then fill it through a char pointer, which would be allowed since int is a POD type.

附录:使用位置 new 后,您将可以在对象上调用任何函数。如果构造正确且由于上述原因未调用UB,则说明您已在所需位置成功创建了一个对象,因此即使该对象不是POD,也可以进行任何调用(例如,因为它具有虚函数)。毕竟,任何分配器类都可能会使用 new 在它们获得的存储中创建对象。请注意,只有在您使用展示位置 new 时,这才是必须的; punning的其他用法(例如,使用fread / fwrite进行的简单序列化)可能会导致对象不完整或不正确,因为该对象中的某些值需要特别对待以维护类不变性。

Addendum: after using placement new, you will be able to call any function on the object. If the construction is correct and does not invoke UB due to the above, then you have successfully created an object at the desired place, so any calls are OK, even if the object was non-POD (e.g. because it had virtual functions). After all, any allocator class will likely use placement new to create the objects in the storage that they obtain. Note that this only necessarily true if you use placement new; other usages of type punning (e.g. naïve serialization with fread/fwrite) may result in an object that is incomplete or incorrect because some values in the object need to be treated specially to maintain class invariants.

这篇关于C ++的严格别名规则-“ char”别名豁免是2条街道吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆