大尾数法和小尾数法有点混乱 [英] Big Endian and Little endian little confusion

查看:204
本文介绍了大尾数法和小尾数法有点混乱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从该网站上阅读有关大小端的表示形式, http: //www.geeksforgeeks.org/little-and-big-endian-mystery/



假设我们有一个数字0x01234567,然后是小尾数存储为(67)(45)(23)(01),在Big endian中存储为(01)(23)(45)(67)。

  char * s = ABCDEF 
int * p =(int *)s;
printf(%d,*(p + 1)); //打印17475(DC值)

在上面的代码中看到打印值后,似乎该字符串存储为(BA)(DC)(FE)。



为什么它不像(EF)(CD)(AB)从LSB到MSB那样存储如第一个例子?我认为字节序意味着字节在多字节内的排序。因此,排序应该像第二种情况一样针对整个2个字节,而不是在这2个字节之内?

解决方案

使用2个字节的 int s,这就是您的内存

  memAddr | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 
数据| ‘A’| ‘B’| ‘C’| ‘D’| ‘E’| ‘F’| ‘0’|
^ s点在这里
^ p + 1点在这里

现在,它看起来您正在使用ASCII编码,所以这就是您真正在内存中存储的内容

  memAddr | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 
数据| 0x41 | 0x42 | 0x43 | 0x44 | 0x45 | 0x46 | 0x00 |
^ s点在这里
^ p + 1点在这里

小字节序的机器,这意味着多字节类型的最低有效字节排在第一位。单字节 char 没有字节序的概念。 ASCII字符串只是 char s ..的字符串。您的 int 是2个字节。因此,对于从内存位置2开始的 int 而言,此字节最低有效,而地址3处的最高有效。这意味着此处的数字以人们通常读取数字的方式读取,为0x4443(基数10中为17475,ASCII字符串为 DC),因为内存位置3中的0x44比内存位置2中的0x43更重要。 ,当然,这将被颠倒,并且数字将为0x4344(以10为基数的17220, CD作为ASCII字符串)。



编辑:



解决您的评论... c 字符串是 NUL 终止的 char s数组,这是绝对正确的。 Endianess仅适用于基本类型, short,int,long,long long 等(原始类型可能是不正确的命名法,知道的人可以纠正我)。数组只是连续内存的一部分,其中1种或多种类型直接相邻存储,并按顺序存储。整个数组没有字节序的概念,但是字节序确实适用于数组各个元素的原始类型。假设您具有以下条件,并假设2个字节 int s:

 整数数组[3]; //具有2个字节的整数,它在内存
array [0] = 0x1234;中占据6个连续字节;
array [1] = 0x5678;
array [2] = 0x9abc;

这就是内存的样子:无论大小端机如何,它看起来都是这样

  memAddr | 0-1 | 2-3 | 4-5 | 
数据|数组[0] |数组[1] |数组[2] |

注意,数组 elements 没有字节序的概念。无论元素是什么,都是如此。元素可以是原始类型,结构等。数组中的第一个元素始终位于 array [0]



但是现在,如果我们看一下数组中实际存在的内容在这里发挥了作用。对于小端机器,内存将如下所示:

  memAddr | 0 | 1 | 2 | 3 | 4 | 5 | 
数据| 0x34 | 0x12 | 0x78 | 0x56 | 0xbc | 0x9a |
^ ______ ^ ^ ______ ^ ^ ______ ^
array [0] array [1] array [2]

最低有效字节在前。一台大字节序的机器看起来像这样:

  memAddr | 0 | 1 | 2 | 3 | 4 | 5 | 
数据| 0x12 | 0x34 | 0x56 | 0x78 | 0x9a | 0xbc |
^ ______ ^ ^ ______ ^ ^ ______ ^
array [0] array [1] array [2]

请注意,数组的每个元素的内容都遵循字节顺序(因为它是原始类型的数组..如果它是的数组structs struct 成员将不受某种形式的固有性逆转,固有性仅适用于基元)。但是,无论在大端机上还是小端机上,数组元素仍然是相同的顺序。



回到字符串,字符串只是一个 NUL 终止的字符数组。 char 是单个字节,因此只有一种订购方式。考虑以下代码:

  char word [] = hey; 

这就是您的记忆:

  memAddr | 0 | 1 | 2 | 3 | 
数据|字[0] |字[1] |字[2] |字[3] |
等于NUL终止符'\0'^

在这种情况下, word 数组是一个字节,并且只有一种订购单个项目的方法,因此,无论在小型还是大型字节序计算机上,这都是您要拥有的内存:

  memAddr | 0 | 1 | 2 | 3 | 
数据| 0x68 | 0x65 | 0x79 | 0x00 |

Endianess仅适用于多字节基本类型。我强烈建议在调试器中四处查看,以实时进行查看。所有流行的IDE都有内存查看窗口,或者使用 gdb 您可以打印出内存。在 gdb 中,您可以将内存打印为字节,半字(2字节),字(4字节),巨字(8字节)等。在小端机上,如果您将字符串打印为字节后,将按顺序看到字母。打印为半字,您将看到每2个字母反向,打印为字,每4个字母反向,依此类推。在大型字节序计算机上,它们都将以相同的可读顺序打印。 / p>

I was reading about little and big endian representations from this site http://www.geeksforgeeks.org/little-and-big-endian-mystery/.

Suppose we have a number 0x01234567, then in little endian it is stored as (67)(45)(23)(01) and in Big endian it is stored as (01)(23)(45)(67).

char *s= "ABCDEF"
int *p = (int *)s;
printf("%d",*(p+1)); // prints 17475 (value of DC)

After seeing the printed value here in the above code, it seems that string is stored as (BA)(DC)(FE).

Why is it not stored like (EF)(CD)(AB) from LSB to MSB as in first example? I thought that endianess means ordering of bytes within multi-bytes. So the ordering should be with respect to "whole 2 bytes" as in 2nd case and not within those 2 bytes right?

解决方案

Working with 2 byte ints, this is what you have in memory

memAddr  |  0  |  1  |  2  |  3  |  4  |  5  |  6   |
data     | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | '\0' |
            ^ s points here
                        ^ p+1 points here

Now, it looks like you're using ASCII encoding, so this is what you really have in memory

memAddr  |  0   |  1   |  2   |  3   |  4   |  5   |  6   |
data     | 0x41 | 0x42 | 0x43 | 0x44 | 0x45 | 0x46 | 0x00 |
            ^ s points here
                          ^ p+1 points here

So for a little endian machine, that means the least significant bytes for a multi-byte type come first. There's no concept of endianess for a single byte char. An ASCII string is just a string of chars.. this has no endianess. Your ints are 2 bytes. So for an int starting at memory location 2, this byte is the least significant, and the one at address 3 is the most significant. This means the number here, read the way people generally read numbers, is 0x4443 (17475 in base 10, "DC" as an ASCII string), since 0x44 in memory location 3 is more significant than 0x43 in memory location 2. For big endian, of course, this would be reversed, and the number would be 0x4344 (17220 in base 10, "CD" as an ASCII string).

EDIT:

Addressing your comment... A c string is a NUL terminated array of chars, that's absolutely correct. Endianess only applies to the primitive types, short, int, long, long long, etc. ("primitive types" may be incorrect nomenclature, someone who knows can correct me). An array is simply a section of contiguous memory where 1 or more types occur directly next to each other, stored sequentially. There is no concept of endianess for the entire array, however, endianess does apply to the primitive types of the individual elements of the array. Let's say you have the following, assume 2 byte ints:

int array[3];  // with 2 byte ints, this occupies 6 contiguous bytes in memory
array[0] = 0x1234;
array[1] = 0x5678;
array[2] = 0x9abc;

This is what memory looks like: It will look like this no matter for a big or little endian machine

memAddr   |    0-1   |    2-3   |    4-5   |
data      | array[0] | array[1] | array[2] |

Notice there is no concept of endianess for the array elements. This is true no matter what the elements are. The elements could be primitive types, structs,, anything. The first element in the array is always at array[0].

But now, if we look at the what's actually in the array, this is where endianess does come into play. For a little endian machine, memory will look like this:

memAddr   |  0   |  1   |  2   |  3   |  4   |  5   |
data      | 0x34 | 0x12 | 0x78 | 0x56 | 0xbc | 0x9a |
             ^______^      ^______^      ^______^
             array[0]      array[1]      array[2]

The least significant bytes are first. A big endian machine would look like this:

memAddr   |  0   |  1   |  2   |  3   |  4   |  5   |
data      | 0x12 | 0x34 | 0x56 | 0x78 | 0x9a | 0xbc |
             ^______^      ^______^      ^______^
             array[0]      array[1]      array[2]

Notice the contents of each element of the array is subject to endianess (because it's an array of primitive types.. if it was an array of structs, the struct members wouldn't subject to some kind of endianess reversal,, endianess only applies to primitives). However, whether on the big or little endian machine, the array elements are still in the same order.

Getting back to your string, a string is simply a NUL terminated array of characters. chars are single bytes, so there's only one way to order them. Consider the code:

char word[] = "hey";

This is what you have in memory:

memAddr   |    0    |    1    |    2    |    3    |
data      | word[0] | word[1] | word[2] | word[3] |
                  equals NUL terminator '\0' ^

Just in this case, each element of the word array is a single byte, and there's only one way to order a single item, so whether on a little or big endian machine, this is what you'll have in memory:

memAddr   |  0   |  1   |  2   |  3   |
data      | 0x68 | 0x65 | 0x79 | 0x00 |

Endianess only applies to multi-byte primitive types. I highly recommend poking around in a debugger to see this in live action. All the popular IDEs have memory view windows, or with gdb you can print out memory. In gdb you can print memory as bytes, halfwords (2 bytes), words (4 bytes), giant words (8 bytes), etc. On a little endian machine, if you print out your string as bytes, you'll see the letters in order. Print out as halfwords, you'll see every 2 letters "reversed", print out as words, every 4 letters "reversed", etc. On a big endian machine, it would all print out in the same "readable" order.

这篇关于大尾数法和小尾数法有点混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆