通过炭严格别名和写作为int * [英] Strict aliasing and writing int via char*

查看:205
本文介绍了通过炭严格别名和写作为int *的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个旧的节目,我序列化的数据结构为字节,通过分配无符号字符数组,然后转换成整数:

In an old program I serialized a data structure to bytes, by allocating an array of unsigned char, and then converted ints by:

*((*int)p) = value;

(其中 P 无符号字符* 是要被存储的值)。

(where p is the unsigned char*, and value is the value to be stored).

这工作得很好,当在Sparc编译它引发由于与校准不当访问内存例外情况除外。这非常有意义,因为数据元素均有不同尺寸,以便 P 迅速成为未对齐,用来存放一个int值,其中底层的Sparc指令需要调整时触发的错误。

This worked fine, except when compiled on Sparc where it triggered exceptions due to accessing memory with improper alignment. Which made perfect sense because the data elements had varying sizes so p quickly became unaligned, and triggered the error when used to store an int value, where the underlying Sparc instructions require alignment.

此迅速(由值写出至炭阵列逐字节)固定。但我有点担心这一点,因为我在过去几年没有问题,许多程序使用这种结构。但显然,我违反了一些C规则(严格走样?)和,而这种情况下,很容易被发现,也许违规可能会导致其他类型的未定义行为较为隐蔽,由于优化编译器等。我也有点纳闷,因为我相信我已经看到了这样的结构在很多C code在岁月的。我想描述的硬件结构交换的数据结构的硬件驱动程序(使用包(1)当然),写那些H / W寄存器等,所以它似乎是一种常见的技术。

This was quickly fixed (by writing out the value to the char-array byte-by-byte). But I'm a bit concerned about this because I've used this construction in many programs over the years without issue. But clearly I'm violating some C rule (strict aliasing?) and whereas this case was easily discovered, maybe the violations can cause other types of undefined behavior that is more subtle due to optimizing compilers etc. I'm also a bit puzzled because I believe I've seen constructions like this in lot of C code over the years. I'm thinking of hardware drivers that describe the data-structure exchanged by the hardware as structs (using pack(1) of course), and writing those to h/w registers etc. So it seems to be a common technique.

所以我的问题是,到底是什么规则是由上述违反,这将是恰当的C方式来实现用例(即序列化数据,以无符号字符数组)。当然,自定义序列化功能,可以为所有的功能被写入到写出来逐字节,但它听起来繁琐,效率不高。

So my question is, is exactly what rule was violated by the above, and what would be the proper C way to realize the use-case (i.e. serializing data to an array of unsigned char). Of course custom serialization functions can be written for all functions to write it out byte-by-byte but it sounds cumbersome and not very efficient.

最后,可不良影响一般(对齐等问题外)通过违反本规则走样预期?

Finally, can ill effects (outside of alignment problems etc.) in general be expected through violation of this aliasing rule?

推荐答案

是的,你的code违反的严格别名规则。在C语言中,只有的char * 签署无符号同行假定别名等类型。

Yes, your code violates strict aliasing rule. In C, only char* and its signed and unsigned counterparts are assumed to alias other types.

那么,做这样的原始序列化的正确方法是创建整型数组,然后把它当作 unsigned char型缓冲区。

So, the proper way to do such raw serialization is to create an array on ints, and then treat it as unsigned char buffer.

int arr[] = { 1, 2, 3, 4, 5 };
unsigned char* rawData = (unsigned char*)arr;

您可以的memcpy FWRITE ,或做其他系列化 RAWDATA ,这是绝对有效的。

You can memcpy, fwrite, or do other serialization of rawData, and it is absolutely valid.

反序列化code可能是这样的:

Deserialization code may look like this:

int* arr = (int*)calloc(5, sizeof(int));
memcpy(arr, rawData, 5 * sizeof(int));

当然,你应该关心字节序的填充和其它问题,实现可靠的序列化。

Sure, you should care of endianness, padding and other issues to implement reliable serialization.

这篇关于通过炭严格别名和写作为int *的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆