是否有性能问题,使用编译包(1)什么时候? [英] Are there performance issues when using pragma pack(1)?

查看:120
本文介绍了是否有性能问题,使用编译包(1)什么时候?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的头文件使用的#pragma包(1)围绕我们的大部分结构的(用于网络和文件I / O)。据我所知,它改变了结构的调整,从8个字节的默认值,1字节对齐。

假设一切都在32位Linux(也许是太的Windows)上运行,有没有即来源于此包装对准任何性能影响?

我不关心移植的库,但更多的文件和网络兼容性I / O不同的#pragma pack和性能问题。


解决方案

内存访问速度最快什么时候可以在字对齐的内存地址发生。最简单的例子是下面的结构(其@Didier也使用):

 结构示例{
   所以char a;
   INT B:
};

缺省的,GCC插入填充,所以在偏移0,b是偏移4(字对齐)。如果没有填充,b不是字对齐,并且访问速度变慢。

慢多少?


  • 对于32位86,根据本<一href=\"http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html\">Intel 64和IA32架构软件开发人员手册:
    的处理器需要两个内存
    访问作出对齐的内存访问;对齐访问只需要一个
    存储器存取。跨越4字节边界或一个字或双操作数
    跨越一个8字节边界四字操作数被认为是未对齐和
    需要两个独立的内存总线周期进行访问。与大多数的性能问题,你必须测试过应用程序,看看有多少的问题,这是在实践中。

  • 根据维基百科,扩展的x86 SSE2一样的要求的字对齐。

  • 许多其他的架构要求字对齐(并会产生SIGBUS错误,如果数据结构不是字对齐)。

对于可移植性:我假设你使用的#pragma包(1),这样就可以发送跨线,并从磁盘结构,无需担心不同编译器或平台的包装结构不同。这是有效的,但是,有几个问题需要注意:


  • 这无助于处理大尾数法与小尾数的问题。您可以通过调用 htons 家庭,符号等,在结构处理这些。

  • 在我的经验,在应用code座无虚席,序列化结构的工作是不是有很多的乐趣。他们是非常困难的修改和扩展而不会破坏向后兼容性,并且已经指出的,也有性能损失。考虑你的水泄不通,序列化结构的内容转移到同等的非包装的,可扩展的结构进行处理,或考虑使用一个完整的序列化库一样的 Protocol Buffers的(其中有 C绑定)。

Our headers use #pragma pack(1) around most of our structs (used for net and file I/O). I understand that it changes the alignment of structs from the default of 8 bytes, to an alignment of 1 byte.

Assuming that everything is run in 32-bit Linux (perhaps Windows too), is there any performance hit that comes from this packing alignment?

I'm not concerned about portability for libraries, but more with compatibility of file and network I/O with different #pragma packs, and performance issues.

解决方案

Memory access is fastest when it can take place at word-aligned memory addresses. The simplest example is the following struct (which @Didier also used):

struct sample {
   char a;
   int b;
};

By default, GCC inserts padding, so a is at offset 0, and b is at offset 4 (word-aligned). Without padding, b isn't word-aligned, and access is slower.

How much slower?

  • For 32-bit x86, according to the Intel 64 and IA32 Architectures Software Developer's Manual:

    The processor requires two memory accesses to make an unaligned memory access; aligned accesses require only one memory access. A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access.

    As with most performance questions, you'd have to benchmark your application to see how much of an issue this is in practice.

  • According to Wikipedia, x86 extensions like SSE2 require word alignment.
  • Many other architectures require word alignment (and will generate SIGBUS errors if data structures aren't word-aligned).

Regarding portability: I assume that you're using #pragma pack(1) so that you can send structs across the wire and to and from disk without worrying about different compilers or platforms packing structs differently. This is valid, however, there are a couple of issues to keep in mind:

  • This does nothing to handle big endian versus little endian issues. You can handle these by calling the htons family of functions on any ints, unsigned, etc. in your structs.
  • In my experience, working with packed, serializable structs in application code isn't a lot of fun. They're very difficult to modify and extend without breaking backwards compatibility, and as already noted, there are performance penalties. Consider transferring your packed, serializable structs' contents into equivalent non-packed, extensible structs for processing, or consider using a full-fledged serialization library like Protocol Buffers (which has C bindings).

这篇关于是否有性能问题,使用编译包(1)什么时候?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆