如何做到位字段及其路线C编程工作? [英] How do bit fields and their alignments work in C programming?

查看:129
本文介绍了如何做到位字段及其路线C编程工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要你的帮助是了解位域在C编程是如何工作的。

我宣布这个结构:

 结构信息
{
    unsigned char型first_char:6;
    unsigned char型second_char:6;
    unsigned char型third_char:6;
    unsigned char型fourth_char:6;
    unsigned char型fifth_char:6;
    unsigned char型sixth_char:6;
    unsigned char型seventh_char:6;
    unsigned char型eigth_char:6;
} __包装的消息;
 

我救了结构的大小成整数使用的sizeof(消息)。

予想到的大小的值将是6,因为6 * 8 = 48位,这是6个字节, 而是它具有8个字节大小的值。

任何人都可以向我解释为什么,以及如何准确位域和他们的路线工作?

修改

我忘了说明,其中我用的是结构的情况。 可以说,我收到这种形式6个字节的数据包: void *的数据包

我再投中的数据是这样的:

 消息*味精=(消息*)包;
 

现在我要打印的每个成员的值,所以虽然我声明的成员为6位的成员使用8位打印时,这会导致错误的结果。 例如我接收下一个数据:

00001111 11110000 00110011 00001111 00111100 00011100

我想成员的值将如下图所示:

first_char = 000011

第二= 111111

第三= 000000

第四= 110011

第五次= 000011

第六= 110011

第七= 110000

第八位= 011100

但是这不是什么hapening,我希望我解释得很好,如果不是请告诉我。

解决方案

几乎一切关于位域的实现定义。特别是,如何位字段挤在一起是实现定义。一个实现不必让位字段交叉寻址存储单元的边界,并且看起来你不

  

ISO / IEC 9899:2011§6.7.2.1结构和联合说明

     

¶11的实现可分配任何可寻址存储单元,大到足以容纳一个位字段。   如果有足够的空间,一个位域紧跟在另外一个位字段   结构应包装到同一单元的相邻比特。如果空间不足仍然存在,   一个位字段不适合是否被放入下一个单元或重叠相邻单元是   实现定义。位域的分配在一个单元(高阶订单   低阶或低阶到高阶)是实现定义。的对准   可寻址存储单元是不确定的。

和其决不是对位字段的'实现定义功能的结束。

[请选择回答通过的 Kerek SB 的,而不是这一个,因为它有重要的信息有关§6.7.2.1¶4以及]


为例code

 的#include< stdio.h中>

#如果!定义(BITFIELD_BASE_TYPE)
#定义BITFIELD_BASE_TYPE字符
#ENDIF

INT主要(无效)
{
    typedef结构信息
    {
        无符号BITFIELD_BASE_TYPE first_char:6;
        无符号BITFIELD_BASE_TYPE second_char:6;
        无符号BITFIELD_BASE_TYPE third_char:6;
        无符号BITFIELD_BASE_TYPE fourth_char:6;
        无符号BITFIELD_BASE_TYPE fifth_char:6;
        无符号BITFIELD_BASE_TYPE sixth_char:6;
        无符号BITFIELD_BASE_TYPE seventh_char:6;
        无符号BITFIELD_BASE_TYPE eighth_char:6;
    } 信息;

    的typedef工会Bytes_Message
    {
        消息m;
        unsigned char型B〔的sizeof(消息)]。
    } Bytes_Message;

    Bytes_Message U;

    的printf(消息大小:%祖\ N的sizeof(消息));

    u.m.first_char = 0x3F的;
    u.m.second_char = 0x15;
    u.m.third_char = 0x2A;
    u.m.fourth_char = 0×11;
    u.m.fifth_char = 0×00;
    u.m.si​​xth_char = 0x23;
    u.m.seventh_char =为0x1C;
    u.m.eighth_char = 0x3A的;

    的printf(位字段:%.2X%.2X%.2X%.2X%.2X%.2X%.2X%.2X \ N,
           umfirst_char,umsecond_char,umthird_char,
           umfourth_char,umfifth_char,umsixth_char,
           u.m.seventh_char,u.m.eighth_char);

    的printf(字节);
    用于(为size_t I = 0; I<的sizeof(消息);我++)
        的printf(%.2X,u.b [I]);
    的putchar('\ N');

    返回0;
}
 

样品编译和运行

测试在Mac OS X 10.9.2小牛与GCC 4.9.0(64位版本; 的sizeof(int)的== 4 的sizeof(long_ == 8 )来源$ C ​​$ c是在 bf.c ;创建的程序是 BF

  $ GCC -DBITFIELD_BASE_TYPE =焦炭-O3 -g -std = C11 -Wall -Wextra -Wmissing-原型-Wstrict-原型-WOLD风格清晰-Werror bf.c  - ØBF
$ ./bf
消息大小:8
位字段:3F 15 2A 11 00 23 1C 3A
字节:3F 15 2A 11 00 23 1C 3A
$ GCC -DBITFIELD_BASE_TYPE =短-O3 -g -std = C11 -Wall -Wextra -Wmissing-原型-Wstrict-原型-WOLD式高清-Werror bf.c -o BF
$ ./bf
消息大小:8
位字段:3F 15 2A 11 00 23 1C 3A
字节:7F 05 6A 04 C0 08 9C 0E
$ GCC -DBITFIELD_BASE_TYPE = INT -O3 -g -std = C11 -Wall -Wextra -Wmissing-原型-Wstrict-原型-WOLD式高清-Werror bf.c -o BF
$ ./bf
消息大小:8
位字段:3F 15 2A 11 00 23 1C 3A
字节:7F A5 46 00 23 A7 03 00
$ GCC -DBITFIELD_BASE_TYPE =长-O3 -g -std = C11 -Wall -Wextra -Wmissing-原型-Wstrict-原型-WOLD式高清-Werror bf.c -o BF
$ ./bf
消息大小:8
位字段:3F 15 2A 11 00 23 1C 3A
字节:7F A5 46 C0 C8 E9 00 00
$
 

请注意,有4个不同的组的结果为4种不同类型的尺寸。注意,那就是,一个编译器不需要以允许这些类型。该标准说(§6.7.2.1再次):

  

¶4的EX pression指定一个位域的宽度应为整型常量   前pression具有非负值不超过的一个对象的宽度   这将是指定类型是结肠和移pression删去。 122)功能如果该值是   零,报关应没有声明。

     

¶5位字段应具有一个类型,是 _Bool 的合格或不合格的版本签署   INT 无符号整型,或其他一些实现定义的类型。

     

122)虽然位在 _Bool 对象的数量至少为 CHAR_BIT ,宽度(数量和   值比特)的 _Bool 可能只是1位。


另一个子问题

  

您可以给我解释一下我为什么会错以为我会得到6的大小?我问了很多我的朋友,但他们不知道很多关于位字段。

我不知道,我知道那么多关于位字段。我从来没有使用过,除了在回答对堆栈溢出的问题。他们是没有用的编写可移植软件时,我的目标是编写可移植的软件 - 或者,至少,软件不是无偿不可移植

我想,你承担的位大致相当于这样的布局:

  + ------ + ------ + ------ + ------ + ------ +  - ---- + ------ + ------ +
| F1 | F2 | F3 | F4 | F5 | F6 | F7 | F8 |
+ ------ + ------ + ------ + ------ + ------ + ------ + ------ + ------ +
 

它应该重新present 48位8组,每组6位,奠定了连续不带空格或填充。

现在,一个原因是不能发生的事情是从§6.7.2.1规则¶4,当你使用一个类型 T 为一个位域,然后位域的宽度不能大于 CHAR_BIT *的sizeof(T)大。在您的code, T 无符号的字符,所以位字段不能大于8位,否则他们跨越存储单元的边界。当然,你只有6位,但它意味着你可以不适合第二位字段到存储单元。如果 T 无符号短,那么只有两个6位字段放入一个16位的存储单元;如果 T 是一个32位的 INT ,那么五个6位字段可以适应;如果 T 是一个64位的无符号长,然后10 6位字段可以适应。

的另一个原因是,获得这样的位字段,交叉字节边界将是适度低效率的。例如,给定(消息在我的例子code定义):

 信息BF = ...初始化code ...

INT NV = 0x2A;
bf.second_char = NV;
 

假定code处理的值为存储在一个填充字节数组,字段重叠字节边界。然后,code需要设置位标记如下:

 字节0 |字节1
+  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  -  +  -  +  -  +  -  +
| X | X | X | X | X | X | Y | Y | Y | Y | Y | Y | Z | Z | Z | Z |
+  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  +  -  -  +  -  +  -  +  -  +
 

这是位模式。该 X 位可能对应于 first_char ;在以Z 位可能对应的​​一部分 third_char ;和 second_char 的旧值。因此,分配有拷贝字节0的前6位,并分配2位的新值的最后两个位:

 ((无符号字符*)及BF)[0] =(((无符号字符*)及BF)[0]放大器; 0xFC有)| ((NV>→4)及03);
((无符号字符*)及BF)[1] =(((无符号字符*)及BF)[1]安培;为0x0F)| ((NV&所述; 4;)及0XF0);
 

如果它被当作一个16位的单元,则code将等于:

 ((无符号短*)及BF)[0] =(((无符号字符*)及BF)[0]放大器; 0xFC0F)| ((NV&所述; 4;)及0x03F0);
 

32位或64位分配是有点类似于16位版本:

 ((unsigned int类型*)及BF)[0] =(((unsigned int类型*)及BF)[0]放大器; 0xFC0FFFFF)|
                                           ((NV&其中;小于20)及​​0x03F00000);
((无符号长*)及BF)[0] =(((无符号长*)及BF)[0]放大器; 0xFC0FFFFFFFFFFFFF)|
                                           ((NV&其中;&所述; 52)及0x03F0000000000000);
 

此使一组特定的有关位布局的位字段内的方式的假设。不同的假设提出了略有不同的EX pressions的事,但类似这种情况下需要的位字段被视为位的连续数组。

通过比较,与6位每字节布局实际使用,分配变得简单得多:

 ((无符号字符*)及BF)[1] = NV和放大器; 0x3F的;
 

和这将是合法的,编译器省略示出为在填充比特的值是不确定的掩模操作(但是其值将必须是一个8位分配)。

中需要访问一个位字段code的含量是一个原因,为什么大多数人避开他们。该不同的编译器可以使不同的布局假设的相同的定义的事实意味着值不能可靠不同类型的机器之间通过。通常情况下,一个ABI将定义标准C没有细节,但如果一台机器是的PowerPC或SPARC,另一个是基于英特尔,那么所有的赌注都关闭。它变得好做的移位和屏蔽自己;至少计算的成本是可见的。

I need your help at understanding how bit fields work in C programming.

I have declared this struct:

struct message
{
    unsigned char first_char : 6;
    unsigned char second_char : 6;
    unsigned char third_char : 6;
    unsigned char fourth_char : 6;
    unsigned char fifth_char : 6;
    unsigned char sixth_char : 6;
    unsigned char seventh_char : 6;
    unsigned char eigth_char : 6;
}__packed message;

I saved the size of the struct into an integer using sizeof(message).

I thought the value of the size will be 6 since 6 * 8 = 48 bits, which is 6 bytes, but instead it has the size value of 8 bytes.

Can anyone explain to me why, and how exactly bit fields and their alignments work?

EDIT

i forgot to explain the situation where i use the struct. lets say i receive packet of 6 bytes in this form: void * packet

i then cast the data like this:

message * msg = (message *)packet;

now i want to print the value of each member, so although i declared the members as 6 bits, the members use 8 bits which cause to wrong result when printing. for example i receive the next data:

00001111 11110000 00110011 00001111 00111100 00011100

i thought the value of the members will be as shown below:

first_char = 000011

second = 111111

third = 000000

fourth = 110011

fifth = 000011

sixth = 110011

seventh = 110000

eigth = 011100

but that is not what hapening, i hope i explained it well, if not please tell me.

解决方案

Almost everything about bit-fields is implementation defined. In particular, how bit-fields are packed together is implementation defined. An implementation need not let bit-fields cross the boundaries of addressable storage units, and it appears that yours does not.

ISO/IEC 9899:2011 §6.7.2.1 Structure and union specifiers

¶11 An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified

And that is by no means the end of the 'implementation-defined' features of bit-fields.

[Please choose the answer by Kerek SB rather than this one as it has the crucial information about §6.7.2.1 ¶4 as well.]


Example code

#include <stdio.h>

#if !defined(BITFIELD_BASE_TYPE)
#define BITFIELD_BASE_TYPE char
#endif

int main(void)
{
    typedef struct Message
    {
        unsigned BITFIELD_BASE_TYPE first_char   : 6;
        unsigned BITFIELD_BASE_TYPE second_char  : 6;
        unsigned BITFIELD_BASE_TYPE third_char   : 6;
        unsigned BITFIELD_BASE_TYPE fourth_char  : 6;
        unsigned BITFIELD_BASE_TYPE fifth_char   : 6;
        unsigned BITFIELD_BASE_TYPE sixth_char   : 6;
        unsigned BITFIELD_BASE_TYPE seventh_char : 6;
        unsigned BITFIELD_BASE_TYPE eighth_char  : 6;
    } Message;

    typedef union Bytes_Message
    {
        Message m;
        unsigned char b[sizeof(Message)];
    } Bytes_Message;

    Bytes_Message u;

    printf("Message size: %zu\n", sizeof(Message));

    u.m.first_char   = 0x3F;
    u.m.second_char  = 0x15;
    u.m.third_char   = 0x2A;
    u.m.fourth_char  = 0x11;
    u.m.fifth_char   = 0x00;
    u.m.sixth_char   = 0x23;
    u.m.seventh_char = 0x1C;
    u.m.eighth_char  = 0x3A;

    printf("Bit fields: %.2X %.2X %.2X %.2X %.2X %.2X %.2X %.2X\n",
           u.m.first_char,   u.m.second_char, u.m.third_char,
           u.m.fourth_char,  u.m.fifth_char,  u.m.sixth_char,
           u.m.seventh_char, u.m.eighth_char);

    printf("Bytes:     ");
    for (size_t i = 0; i < sizeof(Message); i++)
        printf(" %.2X", u.b[i]);
    putchar('\n');

    return 0;
}

Sample compilations and runs

Testing on Mac OS X 10.9.2 Mavericks with GCC 4.9.0 (64-bit build; sizeof(int) == 4 and sizeof(long_ == 8). Source code is in bf.c; the program created is bf.

$ gcc -DBITFIELD_BASE_TYPE=char -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Werror bf.c -o bf
$ ./bf
Message size: 8
Bit fields: 3F 15 2A 11 00 23 1C 3A
Bytes:      3F 15 2A 11 00 23 1C 3A
$ gcc -DBITFIELD_BASE_TYPE=short -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Werror bf.c -o bf
$ ./bf
Message size: 8
Bit fields: 3F 15 2A 11 00 23 1C 3A
Bytes:      7F 05 6A 04 C0 08 9C 0E
$ gcc -DBITFIELD_BASE_TYPE=int -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Werror bf.c -o bf
$ ./bf
Message size: 8
Bit fields: 3F 15 2A 11 00 23 1C 3A
Bytes:      7F A5 46 00 23 A7 03 00
$ gcc -DBITFIELD_BASE_TYPE=long -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Werror bf.c -o bf
$ ./bf
Message size: 8
Bit fields: 3F 15 2A 11 00 23 1C 3A
Bytes:      7F A5 46 C0 C8 E9 00 00
$

Note that there are 4 different sets of results for the 4 different type sizes. Note, too, that a compiler is not required to allow these types. The standard says (§6.7.2.1 again):

¶4 The expression that specifies the width of a bit-field shall be an integer constant expression with a nonnegative value that does not exceed the width of an object of the type that would be specified were the colon and expression omitted.122) If the value is zero, the declaration shall have no declarator.

¶5 A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type.

122) While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit.


Another sub-question

Can you explain to me why I was wrong with thinking I would get the size of 6? I asked a lot of my friends but they don't know much about bit-fields.

I'm not sure I know all that much about bit-fields. I've never used them except in answers to questions on Stack Overflow. They're of no use when writing portable software, and I aim to write portable software — or, at least, software that is not gratuitously non-portable.

I imagine that you assumed a layout of the bits roughly equivalent to this:

+------+------+------+------+------+------+------+------+
|  f1  |  f2  |  f3  |  f4  |  f5  |  f6  |  f7  |  f8  |
+------+------+------+------+------+------+------+------+

It is supposed to represent 48 bits in 8 groups of 6 bits, laid out contiguously with no spaces or padding.

Now, one reason why that can't happen is the rule from §6.7.2.1 ¶4 that when you use a type T for a bit-field, then the width of the bit-field cannot be larger than CHAR_BIT * sizeof(T). In your code, T was unsigned char, so bit-fields cannot be larger than 8 bits or else they cross storage unit boundaries. Of course, yours are only 6 bits, but it means that you can't fit a second bit-field into the storage unit. If T is unsigned short, then only two 6-bit fields fit into a 16-bit storage unit; if T is a 32-bit int, then five 6-bit fields can fit; if T is a 64-bit unsigned long, then 10 6-bit fields can fit.

Another reason is that access to such bit-fields that cross byte boundaries would be moderately inefficient. For example, given (Message as defined in my example code):

Message bf = …initialization code…

int nv = 0x2A;
bf.second_char = nv;

Suppose that the code treated the values as being stored in a packed byte array with fields overlapping byte boundaries. Then the code needs to set the bits marked y below:

             Byte 0             |            Byte 1
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| x | x | x | x | x | x | y | y | y | y | y | y | z | z | z | z |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

This is a pattern of bits. The x bits might correspond to first_char; the z bits might correspond to part of third_char; and the y bits to the old value of second_char. So, the assignment has to copy the first 6 bits of Byte 0 and assign 2 bits of the new value to the last two bits:

((unsigned char *)&bf)[0] = (((unsigned char *)&bf)[0] & 0xFC) | ((nv >> 4) & 0x03);
((unsigned char *)&bf)[1] = (((unsigned char *)&bf)[1] & 0x0F) | ((nv << 4) & 0xF0);

If it is treated as a 16-bit unit, then the code would be equivalent to:

((unsigned short *)&bf)[0] = (((unsigned char *)&bf)[0] & 0xFC0F) | ((nv << 4) & 0x03F0);

The 32-bit or 64-bit assignments are somewhat similar to the 16-bit version:

((unsigned int  *)&bf)[0] = (((unsigned int  *)&bf)[0] & 0xFC0FFFFF) |
                                           ((nv << 20) & 0x03F00000);
((unsigned long *)&bf)[0] = (((unsigned long *)&bf)[0] & 0xFC0FFFFFFFFFFFFF) |
                                           ((nv << 52) & 0x03F0000000000000);

This makes a particular set of assumptions about the way the bits are laid out inside the bit-field. Different assumptions come up with slightly different expressions, but something analogous to this is needed if the bit-field is treated as a contiguous array of bits.

By comparison, with the 6-bits per byte layout actually used, the assignment becomes much simpler:

((unsigned char *)&bf)[1] = nv & 0x3F;

and it would be legitimate for the compiler to omit the mask operation shown as the values in the padding bits is indeterminate (but the value would have to be an 8-bit assignment).

The amount of code needed to access a bit-field is one reason why most people avoid them. The fact that different compilers can make different layout assumptions for the same definition means that values cannot be reliably passed between machines of different types. Usually, an ABI will define the details that Standard C does not, but if one machine is a PowerPC or SPARC and the other is based on Intel, then all bets are off. It becomes better to do the shifting and masking yourself; at least the cost of the computation is visible.

这篇关于如何做到位字段及其路线C编程工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆