基于通用char []的存储,避免了与严格混淆相关的UB [英] Generic char[] based storage and avoiding strict-aliasing related UB

查看:126
本文介绍了基于通用char []的存储,避免了与严格混淆相关的UB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试构建一个类模板,该类模板将一堆类型打包在一个适当的大型char数组中,并允许将数据作为单独的正确键入的引用进行访问.现在,根据标准,这可能会导致严格混叠违规,从而导致不确定的行为,因为我们正在通过不兼容的对象访问char[]数据.具体来说,标准规定:

如果程序尝试通过以下类型之一以外的glvalue访问对象的存储值,则行为未定义:

  • 对象的动态类型,
  • 对象的动态类型的cv限定版本,
  • 类似于对象的动态类型的类型(定义见4.4)
  • 一种类型,它是与对象的动态类型相对应的有符号或无符号类型,
  • 一种类型,它是与对象的动态类型的cv限定版本相对应的有符号或无符号类型,
  • 集合或联合类型,其元素或非静态数据成员(包括递归地包括子集合或包含的联合的元素或非静态数据成员)中包括上述类型之一,
  • 一种类型,它是对象动态类型的(可能是cv限定的)基类类型,
  • charunsigned char类型.

鉴于突出显示的项目符号的措辞,我想到了以下alias_cast想法:

#include <iostream>
#include <type_traits>

template <typename T>
T alias_cast(void *p) {
    typedef typename std::remove_reference<T>::type BaseType;
    union UT {
        BaseType t;
    };
    return reinterpret_cast<UT*>(p)->t;
}

template <typename T, typename U>
class Data {
    union {
        long align_;
        char data_[sizeof(T) + sizeof(U)];
    };
public:
    Data(T t = T(), U u = U()) { first() = t; second() = u; }
    T& first() { return alias_cast<T&>(data_); }
    U& second() { return alias_cast<U&>(data_ + sizeof(T)); }
};


int main() {
    Data<int, unsigned short> test;
    test.first() = 0xdead;
    test.second() = 0xbeef;
    std::cout << test.first() << ", " << test.second() << "\n";
    return 0;
}

(上面的测试代码,尤其是Data类,只是该想法的精简演示,因此,请不要指出我应该如何使用std::pairstd::tuple.alias_cast模板也应该扩展为可以处理cv合格的类型,并且只有在满足对齐要求的情况下,才能安全地使用它,但是我希望此代码段足以证明这一想法.)

此技巧使g ++(使用g++ -std=c++11 -Wall -Wextra -O2 -fstrict-aliasing -Wstrict-aliasing编译时)的警告消失,并且代码可以正常工作,但这真的是一种告诉编译器跳过基于严格混淆的优化的有效方法吗?

如果无效,那么如何在不违反别名规则的情况下实现基于char数组的通用存储类呢?

像这样用简单的reinterpret_cast替换alias_cast:

T& first() { return reinterpret_cast<T&>(*(data_ + 0)); }
U& second() { return reinterpret_cast<U&>(*(data_ + sizeof(T))); }

当使用g ++进行编译时,

会产生以下警告:

aliastest-so-1.cpp:以"T& Data :: first()[with T = int; U = short unsigned int]’:aliastest-so-1.cpp:28:16:
从这里需要aliastest-so-1.cpp:21:58:警告:取消引用 类型指针会破坏严格的混叠规则 [-Wstrict-aliasing]

解决方案

如果您要坚持严格的一致性,使用工会几乎不是一个好主意,在阅读活动成员时,工会有严格的规定(而且仅一个).尽管必须说,实现喜欢使用联合作为可靠行为的钩子,也许这就是您所追求的.如果是这种情况,我请迈克·阿克顿(Mike Acton)撰写了 (和很长的文章)关于别名规则,他在其中对通过联合强制转换进行评论.

据我所知,这就是应如何处理char类型数组作为存储的方法:

// char or unsigned char are both acceptable
alignas(alignof(T)) unsigned char storage[sizeof(T)];
::new (&storage) T;
T* p = static_cast<T*>(static_cast<void*>(&storage));

之所以定义为有效,是因为T 对象的动态类型.当新表达式创建T对象时,该存储被重用,该操作隐式结束了storage的生存期(这很容易发生,因为unsigned char简单类型)./p>

您仍然可以使用storage[0]读取对象的字节,因为这是通过unsigned char类型的glvalue(列出的显式异常之一)读取对象值的.另一方面,如果storage是另一种却仍然微不足道的元素类型,则您仍然可以使上述代码段起作用,但将无法执行storage[0].

使代码段有意义的最后一块是指针转换.请注意,reinterpret_cast通常不适用于 .鉴于T是标准布局(对齐也有其他限制),这是有效的,但是如果是这种情况,那么使用reinterpret_cast等同于通过void进行static_cast像我一样.首先直接使用该格式更有意义,尤其是考虑到在通用上下文中使用存储的情况很多.无论如何,与void进行来回转换都是标准转换之一(具有明确的含义),对于这些,您需要static_cast.

如果您完全担心指针转换(我认为这是最薄弱的链接,而不是有关存储重用的争论),那么可以选择另一种方法

T* p = ::new (&storage) T;

如果要跟踪它,则会在存储中花费额外的指针.

我衷心推荐使用std::aligned_storage.

I'm trying to build a class template that packs a bunch of types in a suitably large char array, and allows access to the data as individual correctly typed references. Now, according to the standard this can lead to strict-aliasing violation, and hence undefined behavior, as we're accessing the char[] data via an object that is not compatible with it. Specifically, the standard states:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.

Given the wording of the highlighted bullet point, I came up with the following alias_cast idea:

#include <iostream>
#include <type_traits>

template <typename T>
T alias_cast(void *p) {
    typedef typename std::remove_reference<T>::type BaseType;
    union UT {
        BaseType t;
    };
    return reinterpret_cast<UT*>(p)->t;
}

template <typename T, typename U>
class Data {
    union {
        long align_;
        char data_[sizeof(T) + sizeof(U)];
    };
public:
    Data(T t = T(), U u = U()) { first() = t; second() = u; }
    T& first() { return alias_cast<T&>(data_); }
    U& second() { return alias_cast<U&>(data_ + sizeof(T)); }
};


int main() {
    Data<int, unsigned short> test;
    test.first() = 0xdead;
    test.second() = 0xbeef;
    std::cout << test.first() << ", " << test.second() << "\n";
    return 0;
}

(The above test code, especially the Data class is just a dumbed-down demonstration of the idea, so please don't point out how I should use std::pair or std::tuple. The alias_cast template should also be extended to handle cv qualified types and it can only be safely used if the alignment requirements are met, but I hope this snippet is enough to demonstrate the idea.)

This trick silences the warnings by g++ (when compiled with g++ -std=c++11 -Wall -Wextra -O2 -fstrict-aliasing -Wstrict-aliasing), and the code works, but is this really a valid way of telling the compiler to skip strict-aliasing based optimizations?

If it's not valid, then how would one go about implementing a char array based generic storage class like this without violating the aliasing rules?

Edit: replacing the alias_cast with a simple reinterpret_cast like this:

T& first() { return reinterpret_cast<T&>(*(data_ + 0)); }
U& second() { return reinterpret_cast<U&>(*(data_ + sizeof(T))); }

produces the following warning when compiled with g++:

aliastest-so-1.cpp: In instantiation of ‘T& Data::first() [with T = int; U = short unsigned int]’: aliastest-so-1.cpp:28:16:
required from here aliastest-so-1.cpp:21:58: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

解决方案

Using a union is almost never a good idea if you want to stick with strict conformance, they have stringent rules when it comes to reading the active member (and this one only). Although it has to be said that implementations like to use unions as hooks for reliable behaviour, and perhaps that is what you are after. If that is the case I defer to Mike Acton who has written a nice (and long) article on aliasing rules, where he does comment on casting through a union.

To the best of my knowledge this is how you should deal with arrays of char types as storage:

// char or unsigned char are both acceptable
alignas(alignof(T)) unsigned char storage[sizeof(T)];
::new (&storage) T;
T* p = static_cast<T*>(static_cast<void*>(&storage));

The reason this is defined to work is that T is the dynamic type of the object here. The storage was reused when the new expression created the T object, which operation implicitly ended the lifetime of storage (which happens trivially as unsigned char is a, well, trivial type).

You can still use e.g. storage[0] to read the bytes of the object as this is reading the object value through a glvalue of unsigned char type, one of the listed explicit exceptions. If on the other hand storage were of a different yet still trivial element type, you could still make the above snippet work but would not be able to do storage[0].

The final piece to make the snippet sensible is the pointer conversion. Note that reinterpret_cast is not suitable in the general case. It can be valid given that T is standard-layout (there are additional restrictions on alignment, too), but if that is the case then using reinterpret_cast would be equivalent to static_casting via void like I did. It makes more sense to use that form directly in the first place, especially considering the use of storage happens a lot in generic contexts. In any case converting to and from void is one of the standard conversions (with a well-defined meaning), and you want static_cast for those.

If you are worried at all about the pointer conversions (which is the weakest link in my opinion, and not the argument about storage reuse), then an alternative is to do

T* p = ::new (&storage) T;

which costs an additional pointer in storage if you want to keep track of it.

I heartily recommend the use of std::aligned_storage.

这篇关于基于通用char []的存储,避免了与严格混淆相关的UB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆