基于通用char []的存储,避免了与严格混淆相关的UB [英] Generic char[] based storage and avoiding strict-aliasing related UB
问题描述
我正在尝试构建一个类模板,该类模板将一堆类型打包在一个适当的大型char数组中,并允许将数据作为单独的正确键入的引用进行访问.现在,根据标准,这可能会导致严格混叠违规,从而导致不确定的行为,因为我们正在通过不兼容的对象访问char[]
数据.具体来说,标准规定:
如果程序尝试通过以下类型之一以外的glvalue访问对象的存储值,则行为未定义:
- 对象的动态类型,
- 对象的动态类型的cv限定版本,
- 类似于对象的动态类型的类型(定义见4.4)
- 一种类型,它是与对象的动态类型相对应的有符号或无符号类型,
- 一种类型,它是与对象的动态类型的cv限定版本相对应的有符号或无符号类型,
- 集合或联合类型,其元素或非静态数据成员(包括递归地包括子集合或包含的联合的元素或非静态数据成员)中包括上述类型之一,
- 一种类型,它是对象动态类型的(可能是cv限定的)基类类型,
char
或unsigned char
类型.
鉴于突出显示的项目符号的措辞,我想到了以下alias_cast
想法:
#include <iostream>
#include <type_traits>
template <typename T>
T alias_cast(void *p) {
typedef typename std::remove_reference<T>::type BaseType;
union UT {
BaseType t;
};
return reinterpret_cast<UT*>(p)->t;
}
template <typename T, typename U>
class Data {
union {
long align_;
char data_[sizeof(T) + sizeof(U)];
};
public:
Data(T t = T(), U u = U()) { first() = t; second() = u; }
T& first() { return alias_cast<T&>(data_); }
U& second() { return alias_cast<U&>(data_ + sizeof(T)); }
};
int main() {
Data<int, unsigned short> test;
test.first() = 0xdead;
test.second() = 0xbeef;
std::cout << test.first() << ", " << test.second() << "\n";
return 0;
}
(上面的测试代码,尤其是Data
类,只是该想法的精简演示,因此,请不要指出我应该如何使用std::pair
或std::tuple
.alias_cast
模板也应该扩展为可以处理cv合格的类型,并且只有在满足对齐要求的情况下,才能安全地使用它,但是我希望此代码段足以证明这一想法.)
此技巧使g ++(使用g++ -std=c++11 -Wall -Wextra -O2 -fstrict-aliasing -Wstrict-aliasing
编译时)的警告消失,并且代码可以正常工作,但这真的是一种告诉编译器跳过基于严格混淆的优化的有效方法吗?
如果无效,那么如何在不违反别名规则的情况下实现基于char数组的通用存储类呢?
像这样用简单的reinterpret_cast
替换alias_cast
:
T& first() { return reinterpret_cast<T&>(*(data_ + 0)); }
U& second() { return reinterpret_cast<U&>(*(data_ + sizeof(T))); }
当使用g ++进行编译时,
会产生以下警告:
aliastest-so-1.cpp:以"T& Data :: first()[with T = int; U = short unsigned int]’:aliastest-so-1.cpp:28:16:
从这里需要aliastest-so-1.cpp:21:58:警告:取消引用 类型指针会破坏严格的混叠规则 [-Wstrict-aliasing]
如果您要坚持严格的一致性,使用工会几乎不是一个好主意,在阅读活动成员时,工会有严格的规定(而且仅一个).尽管必须说,实现喜欢使用联合作为可靠行为的钩子,也许这就是您所追求的.如果是这种情况,我请迈克·阿克顿(Mike Acton)撰写了 (和很长的文章)关于别名规则,他在其中对通过联合强制转换进行评论.
据我所知,这就是应如何处理char类型数组作为存储的方法:
// char or unsigned char are both acceptable
alignas(alignof(T)) unsigned char storage[sizeof(T)];
::new (&storage) T;
T* p = static_cast<T*>(static_cast<void*>(&storage));
之所以定义为有效,是因为T
是对象的动态类型.当新表达式创建T
对象时,该存储被重用,该操作隐式结束了storage
的生存期(这很容易发生,因为unsigned char
是简单类型)./p>
您仍然可以使用storage[0]
读取对象的字节,因为这是通过unsigned char
类型的glvalue(列出的显式异常之一)读取对象值的.另一方面,如果storage
是另一种却仍然微不足道的元素类型,则您仍然可以使上述代码段起作用,但将无法执行storage[0]
.
使代码段有意义的最后一块是指针转换.请注意,reinterpret_cast
通常不适用于 .鉴于T
是标准布局(对齐也有其他限制),这是有效的,但是如果是这种情况,那么使用reinterpret_cast
等同于通过void
进行static_cast
像我一样.首先直接使用该格式更有意义,尤其是考虑到在通用上下文中使用存储的情况很多.无论如何,与void
进行来回转换都是标准转换之一(具有明确的含义),对于这些,您需要static_cast
.
如果您完全担心指针转换(我认为这是最薄弱的链接,而不是有关存储重用的争论),那么可以选择另一种方法
T* p = ::new (&storage) T;
如果要跟踪它,则会在存储中花费额外的指针.
我衷心推荐使用std::aligned_storage
.
I'm trying to build a class template that packs a bunch of types in a suitably large char array, and allows access to the data as individual correctly typed references. Now, according to the standard this can lead to strict-aliasing violation, and hence undefined behavior, as we're accessing the char[]
data via an object that is not compatible with it. Specifically, the standard states:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
- the dynamic type of the object,
- a cv-qualified version of the dynamic type of the object,
- a type similar (as defined in 4.4) to the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
- an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
- a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
- a
char
orunsigned char
type.
Given the wording of the highlighted bullet point, I came up with the following alias_cast
idea:
#include <iostream>
#include <type_traits>
template <typename T>
T alias_cast(void *p) {
typedef typename std::remove_reference<T>::type BaseType;
union UT {
BaseType t;
};
return reinterpret_cast<UT*>(p)->t;
}
template <typename T, typename U>
class Data {
union {
long align_;
char data_[sizeof(T) + sizeof(U)];
};
public:
Data(T t = T(), U u = U()) { first() = t; second() = u; }
T& first() { return alias_cast<T&>(data_); }
U& second() { return alias_cast<U&>(data_ + sizeof(T)); }
};
int main() {
Data<int, unsigned short> test;
test.first() = 0xdead;
test.second() = 0xbeef;
std::cout << test.first() << ", " << test.second() << "\n";
return 0;
}
(The above test code, especially the Data
class is just a dumbed-down demonstration of the idea, so please don't point out how I should use std::pair
or std::tuple
. The alias_cast
template should also be extended to handle cv qualified types and it can only be safely used if the alignment requirements are met, but I hope this snippet is enough to demonstrate the idea.)
This trick silences the warnings by g++ (when compiled with g++ -std=c++11 -Wall -Wextra -O2 -fstrict-aliasing -Wstrict-aliasing
), and the code works, but is this really a valid way of telling the compiler to skip strict-aliasing based optimizations?
If it's not valid, then how would one go about implementing a char array based generic storage class like this without violating the aliasing rules?
Edit:
replacing the alias_cast
with a simple reinterpret_cast
like this:
T& first() { return reinterpret_cast<T&>(*(data_ + 0)); }
U& second() { return reinterpret_cast<U&>(*(data_ + sizeof(T))); }
produces the following warning when compiled with g++:
aliastest-so-1.cpp: In instantiation of ‘T& Data::first() [with T = int; U = short unsigned int]’: aliastest-so-1.cpp:28:16:
required from here aliastest-so-1.cpp:21:58: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
Using a union is almost never a good idea if you want to stick with strict conformance, they have stringent rules when it comes to reading the active member (and this one only). Although it has to be said that implementations like to use unions as hooks for reliable behaviour, and perhaps that is what you are after. If that is the case I defer to Mike Acton who has written a nice (and long) article on aliasing rules, where he does comment on casting through a union.
To the best of my knowledge this is how you should deal with arrays of char types as storage:
// char or unsigned char are both acceptable
alignas(alignof(T)) unsigned char storage[sizeof(T)];
::new (&storage) T;
T* p = static_cast<T*>(static_cast<void*>(&storage));
The reason this is defined to work is that T
is the dynamic type of the object here. The storage was reused when the new expression created the T
object, which operation implicitly ended the lifetime of storage
(which happens trivially as unsigned char
is a, well, trivial type).
You can still use e.g. storage[0]
to read the bytes of the object as this is reading the object value through a glvalue of unsigned char
type, one of the listed explicit exceptions. If on the other hand storage
were of a different yet still trivial element type, you could still make the above snippet work but would not be able to do storage[0]
.
The final piece to make the snippet sensible is the pointer conversion. Note that reinterpret_cast
is not suitable in the general case. It can be valid given that T
is standard-layout (there are additional restrictions on alignment, too), but if that is the case then using reinterpret_cast
would be equivalent to static_cast
ing via void
like I did. It makes more sense to use that form directly in the first place, especially considering the use of storage happens a lot in generic contexts. In any case converting to and from void
is one of the standard conversions (with a well-defined meaning), and you want static_cast
for those.
If you are worried at all about the pointer conversions (which is the weakest link in my opinion, and not the argument about storage reuse), then an alternative is to do
T* p = ::new (&storage) T;
which costs an additional pointer in storage if you want to keep track of it.
I heartily recommend the use of std::aligned_storage
.
这篇关于基于通用char []的存储,避免了与严格混淆相关的UB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!