无论是正数还是负数，通过char 缓冲区读取int的行为都不同 [英] Reading an int through char* buffer behaves different whether it is positive or negative*

查看：102 发布时间：2020/7/29 21:17:49 c++ binary-deserialization

本文介绍了无论是正数还是负数，通过char *缓冲区读取int的行为都不同的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

背景:我想知道如果我们通过char *缓冲区将二进制数据反序列化(手动).

Background: I was wondering how to (manually) deserialize binary data if we got them through a char * buffer.

假设:作为最小示例，我们将在此处考虑:

Assumptions: As a minimal example, we will consider here that:

我只有一个通过char*缓冲区序列化的int.
我想从缓冲区取回原始的int.
sizeof(int) == 4在目标系统/平台上.
目标系统/平台的字节序为 little-endian .

I have only one int serialized through a char* buffer.
I want to get the original int back from the buffer.
sizeof(int) == 4 on the target system/platform.
The endianness of the target system/platform is little-endian.

注意:这纯粹出于普遍兴趣，因此我不想使用与std::memcpy类似的任何东西，因为我们不会看到我遇到的奇怪行为.

Note: This is out of purely general interest therefore I don't want to use anything alike to std::memcpy because we'll not see the strange behaviour I encountered.

测试:我已经建立了以下测试用例:

Test: I have built the following test case:

#include <iostream>
#include <bitset>

int main()
{
    // Create neg_num and neg_num_bytes then display them
    int neg_num(-5000);
    char * neg_num_bytes = reinterpret_cast<char*>(&neg_num);
    display(neg_num, neg_num_bytes);

    std::cout << '\n';

    // Create pos_num and pos_num_bytes then display them
    int pos_num(5000);
    char * pos_num_bytes = reinterpret_cast<char*>(&pos_num);
    display(pos_num, pos_num_bytes);

    std::cout << '\n';

    // Get neg_num back from neg_num_bytes through bitmask operations
    int neg_num_back = 0;
    for(std::size_t i = 0; i < sizeof neg_num; ++i)
        neg_num_back |= static_cast<int>(neg_num_bytes[i]) << CHAR_BIT*i; // For little-endian

    // Get pos_num back from pos_num_bytes through bitmask operations
    int pos_num_back = 0;
    for(std::size_t i = 0; i < sizeof pos_num; ++i)
        pos_num_back |= static_cast<int>(pos_num_bytes[i]) << CHAR_BIT*i; // For little-endian

    std::cout << "Reconstructed neg_num: " << neg_num_back << ": " << std::bitset<CHAR_BIT*sizeof neg_num_back>(neg_num_back);
    std::cout << "\nReconstructed pos_num: " << pos_num_back << ":  " << std::bitset<CHAR_BIT*sizeof pos_num_back>(pos_num_back) << std::endl;

    return 0;
}

其中display()定义为:

// Warning: num_bytes must have a size of sizeof(int)
void display(int num, char * num_bytes)
{
    std::cout << num << " (from int)  : " << std::bitset<CHAR_BIT*sizeof num>(num) << '\n';
    std::cout << num << " (from char*): ";
    for(std::size_t i = 0; i < sizeof num; ++i)
        std::cout << std::bitset<CHAR_BIT>(num_bytes[sizeof num -1 -i]); // For little-endian
    std::cout << std::endl;
}

我得到的输出是:

-5000 (from int)  : 11111111111111111110110001111000
-5000 (from char*): 11111111111111111110110001111000

5000 (from int)  : 00000000000000000001001110001000
5000 (from char*): 00000000000000000001001110001000

Reconstructed neg_num: -5000: 11111111111111111110110001111000
Reconstructed pos_num: -120:  11111111111111111111111110001000

我知道测试案例代码很难阅读.简要说明一下:

I know the test case code is quite hard to read. To explain it briefly:

我创建一个int.
我创建一个char*数组，该数组指向先前创建的int的第一个字节(以模拟我在char*缓冲区中存储了真实的int).因此，其大小为4.
我显示int及其二进制表示形式
我显示int和char*缓冲区中存储的每个字节的连接，以比较它们是否相同(由于字节顺序的原因，顺序相反).
尝试从缓冲区取回原始的int.
我显示重建的int及其二进制表示形式.

I create an int.
I create a char* array pointing the first byte of the previously created int (to simulate that I have a real int stored in a char* buffer). Its size is consequently 4.
I display the int and its binary representation
I display the int and the concatenation of each bytes stored in the char* buffer to compare that they are the same (in reverse order due to endianness purposes).
Try to get the original int back from the buffer.
I display the reconstructed int as well as its binary representation.

我对负值和正值执行了此过程.这就是为什么代码的可读性差(对此感到抱歉).

I performed this procedure for both negative and positive values. This is why the code is less readable as it should be (sorry for that).

我们可以看到，负值可以成功地重建，但对于正值却不起作用(我期望5000并且得到了-120).

As we can see, the negative value could be reconstructed successfully, but it did not work for the positive one (I expected 5000 and I got -120).

我用其他几个负值和正值进行了测试，结论仍然是相同的，它在负数下可以正常工作，但在正数下不能工作.

I've made the test with several other negative values and positive values and the conclusion is still the same, it works fine with negative numbers but fails with positive numbers.

问题:我很难理解为什么当4个chars保持不变时，通过按位移位将4个chars连接为int为何会改变char值的正数负值?

Question: I'm in trouble to understand why does concatenating 4 chars into an int via bit-wise shifts change the char values for positive numbers when they stay unchanged with negative values ?

当我们查看二进制表示形式时，我们可以看到重构的数字不是由我串联的char组成.

When we look at the binary representation, we can see that the reconstructed numbers is not composed of the chars that I have concatenated.

与static_cast<int>有关吗?如果我删除了它，则积分提升规则将隐式地应用它.但是我需要这样做，因为我需要将其转换为int，以免丢失转换的结果.
如果这是问题的核心，如何解决?

Is it related with the static_cast<int> ? If I remove it, the integral promotion rule will implicitly apply it anyway. But I need this to be done since I need to convert it into an int in order to not lose the result of the shifts.
If this is the heart of the issue, how to solve it ?

此外:是否有比逐位移位更好的方法来取回值?不依赖于系统/平台的字节序的东西.

Additionally: Is there a better way to get back the value than bit-wise shifting ? Something that is not dependent to the endianness of the system/platform.

也许这应该是另一个单独的问题.

无论是正数还是负数，通过char 缓冲区读取int的行为都不同 [英] Reading an int through char* buffer behaves different whether it is positive or negative*

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

无论是正数还是负数，通过char *缓冲区读取int的行为都不同 [英] Reading an int through char* buffer behaves different whether it is positive or negative

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

无论是正数还是负数，通过char 缓冲区读取int的行为都不同 [英] Reading an int through char* buffer behaves different whether it is positive or negative*

登录关闭