使用gcc -O SSE优化的__int128对齐段故障 [英] __int128 alignment segment fault with gcc -O SSE optimize

查看:232
本文介绍了使用gcc -O SSE优化的__int128对齐段故障的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用__int128作为结构的成员. 它可以与-O0一起查找(无需优化).

I use __int128 as struct's member. It works find with -O0 (no optimization).

但是,如果启用优化(-O1),它会因段错误而崩溃.

However it crashes for segment fault if optimization enabled (-O1).

它在指令movdqa上崩溃,该指令需要将var对齐16. 地址由malloc()分配,该地址仅按8对齐.

It crashes at instruction movdqa, which need the var aligned by 16. While the address is allocated by malloc() which align only by 8.

我试图通过-mno-sse禁用SSE优化,但编译失败:

I tried to disable SSE optimization by -mno-sse, but it fails to compile:

/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:27:1: error: SSE register return with SSE disabled

那如果我要同时使用__int128-O1怎么办?

So what can I do if I want to use __int128 and -O1 both?

先谢谢了 吴

顺便说一句,如果__int128仅用于堆栈(而不是堆),则似乎可以.

BTW, it seems OK if __int128 is used only on stack (not on heap).

====编辑====

==== EDIT ====

对不起,我没有说实话.

Sorry that I did not say the truth.

实际上我没有使用malloc().我使用了一个内存池库,该库返回以8对齐的地址. 我说malloc()只是想使事情变得简单.

In fact I did not use malloc(). I used a memory pool lib which returns address aligned by 8. I said malloc() just to want to make things simple.

经过测试,我知道malloc()对齐16.并且__int128成员在struct中也对齐16.

After testing, I have known that malloc() aligns by 16. And the __int128 member also align by 16 in struct.

所以问题出在我的内存池库中.

So the problem is my memory pool lib only.

非常感谢.

推荐答案

对于x86-64系统V,alignof(maxalign_t) == 16表示,因此malloc始终返回16字节对齐的指针.听起来您的分配器已损坏,并且如果也用于long double,则将违反ABI. (将其重新发布为答案,因为事实证明它是答案).

For x86-64 System V, alignof(maxalign_t) == 16 so malloc always returns 16-byte aligned pointers. It sounds like your allocator is broken, and would violate the ABI if used for long double as well. (Reposting this as an answer because it turns out it was the answer).

保证malloc返回的内存能够容纳任何标准类型,因此,如果大小足够大,则意味着足够对齐.

Memory returned by malloc is guaranteed to be able to hold any standard type, so that means being aligned enough if the size is large enough.

这不能是32位代码,因为gcc在32位目标中不支持__int128. (32位glibc malloc仅保证8字节对齐.)

This can't be 32-bit code, because gcc doesn't support __int128 in 32-bit targets. (32-bit glibc malloc only guarantees 8-byte alignment.)

通常,如果您违反类型的对齐要求,则允许编译器生成出错的代码.在x86上,事情通常只适用于未对齐的内存,直到编译器使用需要对齐的SIMD指令为止.甚至使用未对齐的uint16_t*进行自动矢量化也会出错(

In general, the compiler is allowed to make code that faults if you violate the alignment requirements of types. On x86 things typically just work with misaligned memory until the compiler uses alignment-required SIMD instructions. Even auto-vectorization with a mis-aligned uint16_t* can fault (Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?), so don't assume that narrow types are always safe. Use memcpy if you need to express an unaligned load in C.

显然alignof(__int128)是16.因此,它们没有重复i386 System V的怪异之处,在i386 System V中,即使8字节对象也只能保证4字节对齐,并且结构打包规则意味着编译器不能自然地赋予它们对齐.

Apparently alignof(__int128) is 16. So they aren't repeating the weirdness from i386 System V where even 8-byte objects are only guaranteed 4-byte alignment, and struct-packing rules mean that compilers can't give them natural alignment.

这是一件好事,因为它使使用SSE进行复制变得高效,并且意味着_Atomic __int128不需要任何额外的特殊处理即可避免会使lock cmpxchg16b变慢的高速缓存行拆分.

This is a Good Thing, because it makes it efficient to copy with SSE, and means _Atomic __int128 doesn't need any extra special handling to avoid cache-line splits that would make lock cmpxchg16b very slow.

这篇关于使用gcc -O SSE优化的__int128对齐段故障的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆