reinterpret_cast之间char 和std :: uint8_t - 安全？ [英] reinterpret_cast between char and std::uint8_t* - safe?*

查看：774 发布时间：2016/10/14 23:30:34 c++ c++11 language-lawyer

本文介绍了reinterpret_cast之间char *和std :: uint8_t * - 安全？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

现在我们有时需要处理二进制数据。在C ++中，我们使用字节序列，因为开头 char 是我们的构建块。定义为 sizeof 为1，它是字节。并且所有库I / O函数默认使用 char 。一切都很好，但总是有一点关注，有些奇怪的bug，一些人 - 一个字节的位数是实现定义。

在C99，它决定引入几个typedef，让开发人员轻松地表达自己，固定宽度的整数类型。可选，当然，因为我们从不想伤害可移植性。其中， uint8_t ，以 std :: uint8_t 迁移到C ++ 11，一个固定宽度的8位无符号整数类型，是真正想要使用8位字节的人的完美选择。

因此，开发人员接受了新的工具，并开始构建库，它们接受8位字节序列，如 std :: uint8_t * ， std :: vector< std :: uint8_t>

但是，或许有一个非常深刻的想法，标准化委员会决定不要求实现 std :: char_traits< std：：uint8_t> 因此禁止开发人员轻松和便携地实例化，例如 std :: basic_fstream< std :: uint8_t> $ c> std :: uint8_t s作为二进制数据。或者，也许我们中的一些人不关心一个字节中的位数，并对它满意。

但不幸的是，两个世界碰撞，有时你必须取数据为 char * ，并将其传递到需要 std :: uint8_t * 的库。但是等待，你说，不是 char 变量位和 std :: uint8_t 固定为8？它会导致数据丢失吗？

好吧，有一个有趣的标准。定义为保存正好一个字节的 char 和字节是存储器的最低可寻址块，因此不能存在具有小于<$ c $的位宽的类型c> char 。接下来，它被定义为能够保存UTF-8代码单元。这给我们最小的8位。所以现在我们有一个typedef，它需要8位宽和一个至少8位宽的类型。但是有替代品吗？是， unsigned char 。记住， char 的signedness是实现定义的。任何其他类型？谢天谢地，不。所有其他整数类型都需要超出8位的范围。

最后， std :: uint8_t ，这意味着使用此类型的库将不会编译，如果它没有定义。但是如果它编译怎么办？我可以非常自信地说，这意味着我们在一个平台上有8位字节和 CHAR_BIT == 8 。

一旦我们有这个知识，我们有8位字节， std :: uint8_t 被实现为 char 或 unsigned char ，我们可以假设我们可以从中执行 reinterpret_cast > char * 到 std :: uint8_t * ，反之亦然吗？它是否便携？

这是我的标准阅读技能失败的地方。我读了关于安全派生的指针（ [basic.stc.dynamic.safety] ），据我所知，如下：

  std :: uint8_t * buffer = / * ... * /; 
 char * buffer2 = reinterpret_cast< char *>（buffer）; 
 std :: uint8_t buffer3 = reinterpret_cast< std :: uint8_t *>（buffer2）;如果我们不触摸 buffer2

>。

 
 
 因此，鉴于以下前提条件：
 
 
  
   CHAR_BIT == 8  
 
   std :: uint8_t  
 
 
 
 
 可移植且安全投放 char * 和 std :: uint8_t 来回运行，假设我们使用的是二进制数据，并且可能缺少 char 的符号
 
 
 编辑：谢谢，Jerry Coffin。我要添加标准（[basic.lval]，§3.10/ 10）的报价：
 
 要通过除了
以下类型之一的glvalue访问对象的存储值，行为未定义：
 
 
  ... 
 
 
   - 一个char或unsigned char类型。
 
 
 EDIT2：  std :: uint8_t 不保证是 unsigned char 的typedef。它可以实现为扩展无符号整数类型，扩展无符号整数类型不包括在§3.10/ 10中。现在是什么？
 
 
  EDIT3：已删除。 
 
 
  EDIT4：我想我会保持这个问题的开放，因为我不是100％肯定，它引发了一个有趣的讨论。
 
 
  EDIT5：好的，做 reinterpret_cast 本身不是不安全的。它是当你试图尊重结果指针的问题开始出现。因此，我们需要观察二进制数据的可能使用情况，如 char * 和 std :: uint8_t * 。 
 
 
 案例1：
  #include< cassert> 
 #include< fstream> 
 
 int main（）
 {
 int value1 = 1234; 
 
 {
 std :: ofstream outputstream（test.bin，std :: ios_base :: binary）; 
 outputstream.write（reinterpret_cast< char *>（& value1），sizeof（value1））; 
} 
 
 int value2 = 1234; 
 
 {
 std :: ifstream inputstream（test.bin，std :: ios_base :: binary）; 
 inputstream.read（reinterpret_cast< char *>（& value2），sizeof（value2））; 
} 
 
 assert（value1 == value2）;案例2：
 
 
 <$ p 
 
 
 
 
 
 
 
 
 
 
  #include< cstdint> 
 #include< cassert> 
 #include< fstream> 
 
 class uint8_char_traits 
 {
 / * ... * / 
}; 
 
 int main（）
 {
 int value1 = 1234; 
 
 {
 std :: basic_ofstream< std :: uint8_t，uint8_char_traits> outputstream（test.bin，std :: ios_base :: binary）; 
 outputstream.write（reinterpret_cast< std :: uint8_t *>（& value1），sizeof（value1））; 
} 
 
 int value2 = 1234; 
 
 {
 std :: basic_ifstream< std :: uint8_t，uint8_char_traits> inputstream（test.bin，std :: ios_base :: binary）; 
 inputstream.read（reinterpret_cast< std :: uint8_t *>（& value2），sizeof（value2））; 
} 
 
 assert（value1 == value2）; 
} 
  
案例3：
  #include< cstdint> 
 #include< cassert> 
 #include< fstream> 
 #include< memory> 
 #include< algorithm> 
 
 void library_function1（char * buffer，std :: size_t size）; 
 void library_function2（std :: uint8_t * buffer，std :: size_t size）; 
 
 int main（）
 {
 const std :: size_t size = sizeof（int）; 
 
 std :: unique_ptr< char []> data（new char [size]）; 
 
 {
 std :: ifstream inputstream（test.bin，std :: ios_base :: binary）; 
 inputstream.read（data.get（），size）; 
} 
 
 library_function1（data.get（），size）; 
 library_function2（reinterpret_cast< std :: uint8_t *>（data.get（）），size）; 
} 
 
 void library_function1（char * buffer，std :: size_t size）
 {
 //子类型1：
 int value1 = * reinterpret_cast< int *>（buffer）; 
 
 //子类型2：
 int value2 = 0; 
 std :: memcpy（& value2，buffer，sizeof（int））; 
 
 //子类型3：
 int value3 = 0; 
 char * ptr = reinterpret_cast< char *>（& value3）; 
 std :: copy（buffer，buffer + sizeof（int），ptr）; 
 
 assert（（value1 == value2）&&（value2 == value3））; 
} 
 
 void library_function2（std :: uint8_t * buffer，std :: size_t size）
 {
 //子类型4：
 int value1 = * reinterpret_cast< int *>（buffer）; 
 
 //子类型5：
 int value2 = 0; 
 std :: memcpy（& value2，buffer，sizeof（int））; 
 
 //子类型6：
 int value3 = 0; 
 std :: uint8_t * ptr = reinterpret_cast< std :: uint8_t *>（& value3）; 
 std :: copy（buffer，buffer + sizeof（int），ptr）; 
 
 assert（（value1 == value2）&&（value2 == value3））; 
} 
  
这些用例应该给我一个更明确的意思。 > 
 
解决方案
如果存在 uint8_t ，基本上唯一的选择是它是一个typedef  unsigned char （或 char ，如果它恰好是未签名的）。与 char 相比，没有什么（但是位字段）可以表示更少的存储，只有8位的其他类型是 bool 。下一个最小的正整数类型是 short ，必须至少为16位。
 
 
   uint8_t 存在，你真的只有两种可能性：你要么将 unsigned char 转换为 unsigned char 或将 signed char 转换为 unsigned char  
 
 
前者是一个身份转换，所以显然是安全的。后者属于在§3.10/ 10中作为char或unsigned char的序列访问任何其他类型的特殊分配，因此它也给出了定义的行为。
 
 
 因为这包括 char 和 unsigned char ，一个转换作为char序列访问它也给定了定义的行为。
 
 
 编辑：至于Luc提到的扩展整数类型，我不知道你如何应用它来获得差异。 C ++引用C99标准定义 uint8_t 等等，所以其余的引号来自C99。
 
 
 §6.2.6.1/ 3指定 unsigned char 应使用纯二进制表示，不带填充位。填充位仅允许在6.2.6.2/1中，其中明确排除 unsigned char 。然而，该部分详细描述了纯二进制表示 - 从字面上到位。因此， unsigned char 和 uint8_t （如果存在）必须在位级别相同地表示。
 
 
 为了看到两者之间的区别，我们必须断言，一些特定的位被视为一个会产生不同于当看另一个时的结果 - 尽管两个必须
 
 
 更直接地说：两者之间的结果差异要求它们不同地解释比特 - 尽管直接要求它们
 
 
 即使在纯粹的理论层面，这似乎很难实现。在接近实用水平的任何事情上，这显然是可笑的。
 
Now we all sometimes have to work with binary data. In C++ we work with sequences of bytes, and since the beginning char was the our building block. Defined to have sizeof of 1, it is the byte. And all library I/O functions use char by default. All is good but there was always a little concern, a little oddity that bugged some people - the number of bits in a byte is implementation-defined.

So in C99, it was decided to introduce several typedefs to let the developers easily express themselves, the fixed-width integer types. Optional, of course, since we never want to hurt portability. Among them, uint8_t, migrated into C++11 as std::uint8_t, a fixed width 8-bit unsigned integer type, was the perfect choice for people who really wanted to work with 8 bit bytes.

And so, developers embraced the new tools and started building libraries that expressively state that they accept 8-bit byte sequences, as std::uint8_t*, std::vector<std::uint8_t> or otherwise.

But, perhaps with a very deep thought, the standardization committee decided not to require implementation of std::char_traits<std::uint8_t> therefore prohibiting developers from easily and portably instantiating, say, std::basic_fstream<std::uint8_t> and easily reading std::uint8_ts as a binary data. Or maybe, some of us don't care about the number of bits in a byte and are happy with it.

But unfortunately, two worlds collide and sometimes you have to take a data as char* and pass it to a library that expects std::uint8_t*. But wait, you say, isn't char variable bit and std::uint8_t is fixed to 8? Will it result into a loss of data? 

Well, there is an interesting Standardese on this. The char defined to hold exactly one byte and byte is the lowest addressable chunk of memory, so there can't be a type with bit width lesser than that of char. Next, it is defined to be able to hold UTF-8 code units. This gives us the minimum - 8 bits. So now we have a typedef which is required to be 8 bits wide and a type that is at least 8 bits wide. But are there alternatives? Yes, unsigned char. Remember that signedness of char is implementation-defined. Any other type? Thankfully, no. All other integral types have require ranges which fall outside of 8 bits.

Finally, std::uint8_t is optional, that means that the library which uses this type will not compile if it's not defined. But what if it compiles? I can say with a great degree of confidence that this means that we are on a platform with 8 bit bytes and CHAR_BIT == 8.

Once we have this knowledge, that we have 8-bit bytes, that std::uint8_t is implemented as either char or unsigned char, can we assume that we can do reinterpret_cast from char* to std::uint8_t* and vice versa? Is it portable?

This is where my Standardese reading skills fail me. I read about safely derived pointers ([basic.stc.dynamic.safety]) and, as far as I understand, the following:
std::uint8_t* buffer = /* ... */ ;
char* buffer2 = reinterpret_cast<char*>(buffer);
std::uint8_t buffer3 = reinterpret_cast<std::uint8_t*>(buffer2);
is safe if we don't touch buffer2. Correct me if I'm wrong.

So, given the following preconditions:


CHAR_BIT == 8
std::uint8_t is defined.


Is it portable and safe to cast char* and std::uint8_t back and forth, assuming that we're working with binary data and the potential lack of sign of char doesn't matter?

I would appreciate references to the Standard with explanations.

EDIT: Thanks, Jerry Coffin. I'm going to add the quote from the Standard ([basic.lval], §3.10/10):

  If a program attempts to access the stored value of an object through a glvalue of other than one of the
  following types the behavior is undefined:
  
  ...
  
  — a char or unsigned char type.
EDIT2: Ok, going deeper. std::uint8_t is not guaranteed to be a typedef of unsigned char. It can be implemented as extended unsigned integer type and extended unsigned integer types are not included in §3.10/10. What now?

EDIT3: Removed. 

EDIT4: I guess I'll keep this question open, because I'm not 100% sure and it raises an interesting discussion.

EDIT5: Ok, doing reinterpret_cast is not unsafe per se. It's when you try to deference resulting pointer the problems start to arise. So we need to observe possible use cases of binary data as char* and std::uint8_t*. Some error checking was omitted for brevity.

Case 1:
#include <cassert>
#include <fstream>

int main()
{
    int value1 = 1234;

    {
        std::ofstream outputstream("test.bin", std::ios_base::binary);
        outputstream.write(reinterpret_cast<char*>(&value1), sizeof(value1));
    }

    int value2 = 1234;

    {
        std::ifstream inputstream("test.bin", std::ios_base::binary);
        inputstream.read(reinterpret_cast<char*>(&value2), sizeof(value2));
    }

    assert(value1 == value2);
}
Case 2:
#include <cstdint>
#include <cassert>
#include <fstream>

class uint8_char_traits
{
    /* ... */
};

int main()
{
    int value1 = 1234;

    {
        std::basic_ofstream<std::uint8_t, uint8_char_traits> outputstream("test.bin", std::ios_base::binary);
        outputstream.write(reinterpret_cast<std::uint8_t*>(&value1), sizeof(value1));
    }

    int value2 = 1234;

    {
        std::basic_ifstream<std::uint8_t, uint8_char_traits> inputstream("test.bin", std::ios_base::binary);
        inputstream.read(reinterpret_cast<std::uint8_t*>(&value2), sizeof(value2));
    }

    assert(value1 == value2);
}
Case 3:
#include <cstdint>
#include <cassert>
#include <fstream>
#include <memory>
#include <algorithm>

void library_function1(char* buffer, std::size_t size);
void library_function2(std::uint8_t* buffer, std::size_t size);

int main()
{
    const std::size_t size = sizeof(int);

    std::unique_ptr<char[]> data(new char[size]);

    {
        std::ifstream inputstream("test.bin", std::ios_base::binary);
        inputstream.read(data.get(), size);
    }

    library_function1(data.get(), size);
    library_function2(reinterpret_cast<std::uint8_t*>(data.get()), size);
}

void library_function1(char* buffer, std::size_t size)
{
    //Subcase 1:
    int value1 = *reinterpret_cast<int*>(buffer);

    //Subcase 2:
    int value2 = 0;
    std::memcpy(&value2, buffer, sizeof(int));

    //Subcase 3:
    int value3 = 0;
    char* ptr = reinterpret_cast<char*>(&value3);
    std::copy(buffer, buffer + sizeof(int), ptr);

    assert((value1 == value2) && (value2 == value3));
}

void library_function2(std::uint8_t* buffer, std::size_t size)
{
    //Subcase 4:
    int value1 = *reinterpret_cast<int*>(buffer);

    //Subcase 5:
    int value2 = 0;
    std::memcpy(&value2, buffer, sizeof(int));

    //Subcase 6:
    int value3 = 0;
    std::uint8_t* ptr = reinterpret_cast<std::uint8_t*>(&value3);
    std::copy(buffer, buffer + sizeof(int), ptr);

    assert((value1 == value2) && (value2 == value3));
}
These use cases should give a more clear intention of my question.
 解决方案 
If uint8_t exists at all, essentially the only choice is that it's a typedef for unsigned char (or char if it happens to be unsigned). Nothing (but a bitfield) can represent less storage than a char, and the only other type that can be as small as 8 bits is a bool. The next smallest normal integer type is a short, which must be at least 16 bits.

As such, if uint8_t exists at all, you really only have two possibilities: you're either casting unsigned char to unsigned char, or casting signed char to unsigned char.

The former is an identity conversion, so obviously safe. The latter falls within the "special dispensation" given for accessing any other type as a sequence of char or unsigned char in §3.10/10, so it also gives defined behavior.

Since that includes both char and unsigned char, a cast to access it as a sequence of char also gives defined behavior.

Edit: As far as Luc's mention of extended integer types goes, I'm not sure how you'd manage to apply it to get a difference in this case. C++ refers to the C99 standard for the definitions of uint8_t and such, so the quotes throughout the remainder of this come from C99.

§6.2.6.1/3 specifies that unsigned char shall use a pure binary representation, with no padding bits. Padding bits are only allowed in 6.2.6.2/1, which specifically excludes unsigned char. That section, however, describes a pure binary representation in detail -- literally to the bit. Therefore, unsigned char and uint8_t (if it exists) must be represented identically at the bit level.

To see a difference between the two, we have to assert that some particular bits when viewed as one would produce results different from when viewed as the other -- despite the fact that the two must have identical representations at the bit level.

To put it more directly: a difference in result between the two requires that they interpret bits differently -- despite a direct requirement that they interpret bits identically.

Even on a purely theoretical level, this appears difficult to achieve. On anything approaching a practical level, it's obviously ridiculous.

                        这篇关于reinterpret_cast之间char *和std :: uint8_t *  - 安全？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

reinterpret_cast之间char 和std :: uint8_t - 安全？ [英] reinterpret_cast between char and std::uint8_t* - safe?*

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

reinterpret_cast之间char *和std :: uint8_t * - 安全？ [英] reinterpret_cast between char* and std::uint8_t* - safe?

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

reinterpret_cast之间char 和std :: uint8_t - 安全？ [英] reinterpret_cast between char and std::uint8_t* - safe?*

登录关闭