mmap和C ++严格的别名规则 [英] mmap and C++ strict aliasing rules

查看:108
本文介绍了mmap和C ++严格的别名规则的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑一个符合POSIX.1-2008的操作系统,并让fd为有效的文件描述符(对于打开文件,读取模式,足够的数据...).以下代码符合C ++ 11标准*(忽略错误检查):

Consider a POSIX.1-2008 compliant operating system, and let fd be a valid file descriptor (to an open file, read mode, enough data...). The following code adheres to the C++11 standard* (ignore error checking):

void* map = mmap(NULL, sizeof(int)*10, PROT_READ, MAP_PRIVATE, fd, 0);
int* foo = static_cast<int*>(map);

现在,以下指令是否违反了严格的别名规则?

Now, does the following instruction break strict aliasing rules?

int bar = *foo;

根据标准:

如果程序尝试通过以下类型之一以外的glvalue访问对象的存储值,则行为未定义:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • 对象的动态类型,
  • 对象的动态类型的cv限定版本,
  • 类似于对象的动态类型的类型(定义见4.4)
  • 一种类型,它是与对象的动态类型相对应的有符号或无符号类型,
  • 一种类型,它是与对象的动态类型的cv限定版本相对应的有符号或无符号类型,
  • 在其元素或非静态数据成员(包括递归地包括子聚合或所包含的并集的元素或非静态数据成员)中包括上述类型之一的集合或联合类型,
  • 一种类型,它是对象动态类型的(可能是cv限定的)基类类型,
  • 字符或无符号字符类型.
  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.

map/foo指向的对象的动态类型是什么?那甚至是物体吗?该标准说:

What's the dynamic type of the object pointed by map / foo ? Is that even an object? The standard says:

类型T的对象的生命周期开始于以下时间:获得具有类型T正确的对齐方式和大小的存储,并且如果该对象具有非平凡的初始化,则其初始化完成.

The lifetime of an object of type T begins when: storage with the proper alignment and size for type T is obtained, and if the object has non-trivial initialization, its initialization is complete.

这是否意味着映射的内存包含10个int对象(假设初始地址已对齐)? 但是,如果这是真的,那么这是否也不适用于此代码(这显然会破坏严格的别名)?

Does this mean that the mapped memory contains 10 int objects (suppose that the initial address is aligned)? But if it is true, wouldn't this apply also to this code (which clearly breaks strict aliasing)?

char baz[sizeof(int)];
int* p=reinterpret_cast<int*>(&baz);
*p=5;

奇怪的是,这是否意味着声明baz会启动任何大小为4的(正确对齐)对象的生存期?

Even oddly, does that mean that declaring baz starts the lifetime of any (properly aligned) object of size 4?

某些上下文:我正在映射一个文件,其中包含我希望直接访问的大量数据.由于此块很大,因此我希望避免将其存储到临时对象.

Some context: I am mmap-ing a file which contains a chunk of data which I wish to directly access. Since this chunk is large I'd like to avoid memcpy-ing to a temporary object.

*在这里可以将nullptr代替NULL,是否将其隐式转换为NULL?标准有什么参考吗?

*can nullptr be instead of NULL here, is it implicitly casted to NULL? Any reference from the standard?

推荐答案

我相信简单的强制转换确实会违反严格的别名.认为这令人信服地高于我的薪水,因此,本文尝试一种解决方法:

I believe simply casting does violate strict aliasing. Arguing that convincingly is above my paygrade, so here is an attempt at a workaround:

template<class T>
T* launder_raw_pod_at( void* ptr ) {
  static_assert( std::is_pod<T>::value, "this only works with plain old data" );
  char buff[sizeof(T)];
  std::memcpy( buff, ptr, sizeof(T) );
  T* r = ::new(ptr) T;
  std::memcpy( ptr, buff, sizeof(T) );
  return r;
}

我相信以上代码对内存的可见副作用为零,并在ptr位置返回了指向合法T*的指针.

I believe the above code has zero observable side effects on memory and returns a pointer to a legal T* at location ptr.

检查编译器是否将上述代码优化为noop.为此,它必须在一个非常基本的水平上理解 memcpy,构造一个T对该内存没有任何作用.

Check if your compiler optimizes the above code to a noop. To do so, it has to understand memcpy at a really fundamental level, and constructing a T has to do nothing to the memory there.

至少clang 4.0.0可以优化此操作.

我们要做的是首先复制 away 个字节.然后,我们使用new放置在那里创建T .最后,我们将字节复制回去.

What we do is we first copy the bytes away. Then we use placement new to create a T there. Finally, we copy the bytes back.

我们有一个合法创建的T,其中包含我们想要的字节.

We have a legally created T with exactly the bytes we want in it.

但是副本的复制和回传都存储在本地缓冲区中,因此没有明显的效果.

But the copy away and back are to a local buffer, so it has no observable effect.

对象的构造(如果是Pod)也不必接触字节.从技术上讲,字节是未定义的.但是聪明的编译器会说什么也不做".

The construction of the object, if a pod, doesn't have to touch bytes either; technically the bytes are undefined. But compilers who are smart say "do nothing".

因此,编译器可以得出结论,可以在运行时 跳过所有这些操作.同时,我们在抽象机中 正确创建了一个在该位置具有适当字节的对象. (假设它具有有效的对齐方式!但这不是此代码的问题.)

So the compiler can work out that all this manipulation can be skipped at runtime. At the same time, we have in the abstract machine properly created an object with the proper bytes at that location. (assuming it has valid alignment! But that isn't this code's problem.)

这篇关于mmap和C ++严格的别名规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆