如何在x86 ASM中原子地移动64位值? [英] How do I atomically move a 64bit value in x86 ASM?

查看:116
本文介绍了如何在x86 ASM中原子地移动64位值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我发现了以下问题:如何做我在x86 ASM中原子读取了一个值? 但这有点不同,在我的情况下,我想在32位应用程序中自动分配一个float(64位double)值.

First, I found this question: How do I atomically read a value in x86 ASM? But its a bit different, in my case I want to atomically assign a float (64bit double) value in a 32bit application.

来自:英特尔®64和IA-32体系结构软件开发人员手册,卷3A"

From: "Intel® 64 and IA-32 ArchitecturesSoftware Developer’s Manual, Volume3A"

奔腾处理器(以及以后的较新处理器)保证了以下附加内存操作将始终以原子方式进行:

The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically:

读取或写入在64位边界上对齐的四字

Reading or writing a quadword aligned on a 64-bit boundary

实际上可以使用一些组装技巧吗?

Is it actually possible using some assembly trick?

推荐答案

在64位x86 asm中,可以使用整数mov rax, [rsi]或x87或SSE2. 只要地址为8字节对齐(或在Intel p6和更高版本的CPU上:不跨越高速缓存行边界)的负载或存储将是原子的.

In 64-bit x86 asm, you can use integer mov rax, [rsi], or x87 or SSE2. As long as the address is 8-byte aligned (or on Intel p6 and later CPUs: doesn't cross a cache-line boundary) the load or store will be atomic.

在32位x86 asm中,仅使用整数寄存器的唯一选择是lock cmpxchg8b,但这对于纯加载或纯存储来说很糟糕. (您可以通过设置Expected = desired = 0来将其用作负载,但只读存储器除外). (gcc/clang在64位模式下将lock cmpxchg16b用于atomic<struct_16_bytes>,但是某些编译器只是选择使16字节对象不是无锁的.)

In 32-bit x86 asm, your only option using only integer registers is lock cmpxchg8b, but that sucks for a pure-load or pure-store. (You can use it as a load by setting expected=desired = 0, except on read-only memory though). (gcc/clang use lock cmpxchg16b for atomic<struct_16_bytes> in 64-bit mode, but some compilers simply choose to make 16-byte objects not lock-free.)

因此答案是:不要使用整数regs :fild qword/fistp qword可以复制任何位模式而无需对其进行更改. (只要将x87精度控件设置为完整的64位尾数).对于Pentium和更高版本上的对齐地址,这是原子的.

So the answer is: don't use integer regs: fild qword / fistp qword can copy any bit-pattern without changing it. (As long as the x87 precision control is set to full 64-bit mantissa). This is atomic for aligned addresses on Pentium and later.

在现代x86上,使用SSE2 movq加载或存储.例如

On a modern x86, use SSE2 movq load or store. e.g.

; atomically store edx:eax to qword [edi], assuming [edi] is 8-byte aligned
movd   xmm0, eax
pinsrd xmm0, edx            ; SSE4.1
movq   [edi], xmm0

在只有SSE1可用的情况下,请使用movlps. (对于负载,您可能想用xorps打破对xmm寄存器的旧值的错误依赖.)

With only SSE1 available, use movlps. (For loads, you may want to break the false-dependency on the old value of the xmm register with xorps).

对于MMX,从mm0-7到/从mm0-7movq起作用.

With MMX, movq to/from mm0-7 works.

gcc在32位模式下按std::atomic<int64_t>的优先顺序使用SSE2 movq,SSE1 movlps或x87 fild/fstp.不幸的是,即使在SSE2可用的情况下,Clang -m32仍使用lock cmpxchg8b: LLVM错误33109 .

gcc uses SSE2 movq, SSE1 movlps, or x87 fild/fstp in that order of preference for std::atomic<int64_t> in 32-bit mode. Clang -m32 unfortunately uses lock cmpxchg8b even when SSE2 is available: LLVM bug 33109. .

已配置某些版本的gcc,即使使用-m32,默认情况下也会打开-msse2(在这种情况下,您可以使用-mno-sse2-march=i486来查看gcc在没有它的情况下的作用).

Some versions of gcc are configured so that -msse2 is on by default even with -m32 (in which case you could use -mno-sse2 or -march=i486 to see what gcc does without it).

<强>我把加载和存储功能探险查看带有x87,SSE和SSE2的gcc中的asm.并来自clang4.0.1和ICC18.

I put load and store functions on the Godbolt compiler explorer to see asm from gcc with x87, SSE, and SSE2. And from clang4.0.1 and ICC18.

gcc作为int-> xmm或xmm-> int的一部分通过内存反弹,即使SSE4(pinsrd/pextrd)可用也是如此.这是一个未优化的问题( gcc错误80833 ).在64位模式下,它倾向于使用-mtune=intel-mtune=haswell的ALU movd + pinsrd/pextrd,但显然在这种情况下不是在32位模式下(在XMM中为64位整数,而不是适当的矢量化).无论如何,请记住,只有atomic<long long> shared中的加载或存储必须是原子的,而堆栈中的其他加载/存储是私有的.

gcc bounces through memory as part of int->xmm or xmm->int, even when SSE4 (pinsrd / pextrd) is available. This is a missed-optimization (gcc bug 80833). In 64-bit mode it favours ALU movd + pinsrd / pextrd with -mtune=intel or -mtune=haswell, but apparently not in 32-bit mode or not for this use-case (64-bit integers in XMM instead of proper vectorization). Anyway, remember that only the load or store from atomic<long long> shared has to be atomic, the other loads/stores to the stack are private.

这篇关于如何在x86 ASM中原子地移动64位值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆