的字符数组结构成员在C标准对齐 [英] Alignment of char array struct members in C standard

查看:132
本文介绍了的字符数组结构成员在C标准对齐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们假设我想读/写一个tar文件头。 考虑到标准C(C89,C99,C11或) 做字符数组有什么特殊的待遇结构,对于填充?可以在编译器中添加填充,以这样的结构:

 结构标题{
    字符名称[100];
    字符模式; [8]
    字符的uid [8];
    焦炭GID [8];
    字符大小[12];
    焦炭的mtime [12];
    字符CHKSUM [8];
    焦炭typeflag;
    焦炭链接名称[100];
    焦尾[255];
};
 

我已经看到了它在code使用的网络上也是如此。只是freading,fwriting这个结构的文件中一个块,假设不会有任何填充。当然,还假设 CHAR_BITS == 8 。 我想这样的C code是如此普遍,标准将处理这种情况,但我无法找到它的话,也许我不会是一个好律师。

修改

接受的答案将给予严格的,或者根据的C类标准的一个最严格的移植实现,这让我治疗与标准库字符串函数这些领域。考虑到 CHAR_BITS 和所有。我想人们需要阅读512 阵列uint8_t有对于这一点,而在那之后,也许它们转换成字符,一个接一个。任何更简单的方法?

解决方案

C11(中的最新免费提供草案)只说有可能是一个结构对象中未命名的填充,但不是在它的开始(§6.7.2.1¶15)和可能有不愿透露姓名的填充在一个结构或联合(§6.7.2.1¶17)结束。它提供了一个结构中填充上没有进一步的限制。

的平台ABI可能对填充更严格的要求,但是根据这将是特定于平台的,其他平台可以具有其它的填充要求。该的x86-64 ABI在Unix / Linux的字符 1字节对齐,并规定:

  

结构和联合承担起自己的最严格对齐部件的对齐方式。每个成员被分配到最低的可用的适当的偏移   对准。任何物体的大小总是对象的对齐的倍数。

     

这是阵列使用相同的取向作为要素,不同之处在于局部或全局   长数组变量至少16个字节或C99变长数组变量   总是具有至少16字节 4

对准      

结构和联合对象可能需要填充,以满足尺寸和对齐   限制。任何填充的内容是不确定的。

     
     

4 对齐要求允许在阵列上运行时,使用SSE指令。   编译器不能在一般计算可变长度阵列(VLA)的大小,但它是EX-   存在意外大多数VLAS至少需要16个字节,所以是合乎逻辑的强制要求VLAS具有   至少16字节对齐。

这似乎在暗示,在这个平台上,会有结构内没有填充。但是,也有在其中数组变量有更严格的对齐限制,以便能够与矢量指令时使用的情况下;其他平台可能会对这种限制对阵列结构的成员。

如果你想成为便携,而在一个单一的通话读取结构,你可能想要看的 readv 。这是一个向量或分散/集中I / O操作,它允许您指定数组和长度的数组读入。例如,对于这种情况下,你可能会这样写:

 结构标题H;
结构iovec的IOV [10];
IOV [0] .iov_base =安培; h.name;
IOV [0] = .iov_len的sizeof(h.name);
IOV [1] .iov_base =安培; h.mode;
IOV [1] .iov_len = sizeof的(h.mode);
/* ... 等等 ... */
bytes_read缓存= readv(FD,IOV,10);
 

注意 readv 在POSIX /单一Unix规范的定义,而不是在C标准。在标准C,做最简单的事情就是阅读每个单独这些元素(甚至与量化的I / O可用,只是阅读和写作每个元素分别将可能会更清楚,除非你绝对需要使用一个单一的呼吁整个I / O操作)。

在您的编辑,你写的:

  

接受的答案将给予严格的,或者根据的C类标准的一个最严格的移植实现,这让我治疗与标准库字符串函数这些领域。考虑到 CHAR_BITS 和所有。我想人们需要阅读512 阵列uint8_t有对于这一点,而在那之后,也许它们转换成字符,一个接一个。任何更简单的方法?

在C规范并不保证 uint8_t有可用:typedef名 UINT N _t 指定宽度为N的无符号整数类型和无填充位....这些类型都是可选的。 (C11草案,§7.20.1.1,¶2-3)。然而,如果8位的值是可用的,则保证是一个8位的值的,因为它保证是至少8位,并保证是最小对象不是一个位字段(§5.2.4.2.1¶1):

  

下面给出应由适合使用恒定的前pressions被替换#如果 preprocessing指令值。此外,除了 CHAR_BIT MB_LEN_MAX ,下面应当由具有相同类型的EX pressions取代为将一个前pression即根据整型的提升转变的相应类型的对象。其实现定义的值应等于或大于在大小(绝对值)所示的那些,用相同的符号

     
- 位最小的对象不是一个位字段(字节)
数   

 CHAR_BIT 8 

所以,如果你没有一个8位字节可用,您将无法读取直接作为单独的数组元素这些字段的访问八位来自他们。你必须手工拆出用位移位和屏蔽个别字节。但是,有没有现代建筑,我知道缺少8位字节(用于通用计算,在那里文件I / O是所有关心;有些DSP的可能,但他们可能不会有标准的C文件I / O )。

如果你有一个8位字节,那么字符保证是8位,所以没有多大的好处比的清晰度等,使用 uint8_t有 VS 字符。如果你真的很担心,我只想确保你有一个支票的地方在构建过程中的 CHAR_BIT 8,并调用它好。

Let us suppose I would like to read/write a tar file header. Considering standard C (C89, C99, or C11), do char arrays have any special treatment in structs, regarding padding? Can the compiler add padding to such a struct:

struct header {
    char name[100];
    char mode[8];
    char uid[8];
    char gid[8];
    char size[12];
    char mtime[12];
    char chksum[8];
    char typeflag;
    char linkname[100];
    char tail[255];
};

I've seen it used in code on the web as well. Just freading, fwriting this struct to the file in one chunk, assuming there will not be any padding. Of course also assuming CHAR_BITS == 8. I'm thinking such C code is so common, the standard would deal with this case, but I just can't find it in it, maybe I would not be a good lawyer.

EDIT

The accepted answer would give a strict, or the strictest possible portable implementation according one of the C standards, that lets me treat these fields with standard library string functions. Considering CHAR_BITS and all. I'm thinking one needs to read an array of 512 uint8_t for this, and after that maybe convert them to chars, one by one. Any easier way?

解决方案

C11 (the latest freely available draft) says only "There may be unnamed padding within a structure object, but not at its beginning" (§6.7.2.1 ¶15) and "There may be unnamed padding at the end of a structure or union" (§6.7.2.1 ¶17). It gives no further restriction on padding within a structure.

The platform ABI may have more stringent requirements on padding, but depending on this will be platform-specific, as other platforms may have other padding requirements. The x86-64 ABI for Unix/Linux gives char 1 byte alignment, and specifies:

Structures and unions assume the alignment of their most strictly aligned component. Each member is assigned to the lowest available offset with the appropriate alignment. The size of any object is always a multiple of the object’s alignment.

An array uses the same alignment as its elements, except that a local or global array variable of length at least 16 bytes or a C99 variable-length array variable always has alignment of at least 16 bytes4

Structure and union objects can require padding to meet size and alignment constraints. The contents of any padding is undefined.


4The alignment requirement allows the use of SSE instructions when operating on the array. The compiler cannot in general calculate the size of a variable-length array (VLA), but it is ex- pected that most VLAs will require at least 16 bytes, so it is logical to mandate that VLAs have at least a 16-byte alignment.

This seems to imply that on this platform, there will be no padding within the struct. However, there are cases in which array variables have stricter alignment restriction in order to be able to be used with vector instructions; other platforms may impose such restrictions on array structure members as well.

If you would like to be portable, while reading the structure in a single call, you might want to look at readv. This is a vectored or scatter/gather I/O operation, which allows you to specify an array of arrays and lengths to read into. For instance, for this case you might write:

struct header h;
struct iovec iov[10];
iov[0].iov_base = &h.name;
iov[0].iov_len = sizeof(h.name);
iov[1].iov_base = &h.mode;
iov[1].iov_len = sizeof(h.mode);
/* ... etc ... */
bytes_read = readv(fd, iov, 10);

Note that readv is defined in POSIX/Single Unix Specification, not in the C standard. In standard C, the easiest thing to do is just read each of these elements individually (and even with vectored I/O available, just reading and writing each element individually will probably be more clear unless you absolutely need to use a single call for the whole I/O operation).

In your edit, you write:

The accepted answer would give a strict, or the strictest possible portable implementation according one of the C standards, that lets me treat these fields with standard library string functions. Considering CHAR_BITS and all. I'm thinking one needs to read an array of 512 uint8_t for this, and after that maybe convert them to chars, one by one. Any easier way?

The C specification does not guarantee that uint8_t is available: "The typedef name uintN_t designates an unsigned integer type with width N and no padding bits.... These types are optional." (C11 draft, §7.20.1.1, ¶2–3). However, if 8 bit values are available, then char is guaranteed to be an 8 bit value, as it is guaranteed to be at least 8 bits and is guaranteed to be the smallest object that is not a bit-field (§5.2.4.2.1 ¶1):

The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

— number of bits for smallest object that is not a bit-field (byte)

CHAR_BIT                              8

So, if you don't have an 8-bit bytes available, you won't be able to read these fields in directly and access octets from them as individual array elements; you would have to manually split out individual bytes using bit shifting and masking. However, there are no modern architectures that I know of which lack 8 bit bytes (for general purpose computing, where file I/O is at all a concern; some DSPs might, but they probably won't have standard C file I/O).

If you do have an 8-bit bytes, then char is guaranteed to be 8 bits, so there's not much benefit other than clarity for using uint8_t vs char. If you're really concerned, I would just ensure that you have a check somewhere in your build process that CHAR_BIT is 8 and call it good.

这篇关于的字符数组结构成员在C标准对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆