可以原子获取和处理的最大数据类型? [英] Largest data type which can be fetch-ANDed atomically?

查看:78
本文介绍了可以原子获取和处理的最大数据类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想尝试使用以下方式自动重置256位:

I wanted to try and atomically reset 256 bits using something like this:

#include <x86intrin.h>
#include <iostream>
#include <array>
#include <atomic>

int main(){

    std::array<std::atomic<__m256i>, 10> updateArray;

    __m256i allZeros = _mm256_setzero_si256();

    updateArray[0].fetch_and(allZeros);
}

但是我收到有关元素不具有fetch_and()的编译器错误.因为256位类型太大而不能保证原子性,这不可能吗?

but I get compiler errors about the element not having fetch_and(). Is this not possible because 256 bit type is too large to guarantee atomicity?

还有其他方法可以实现吗?我正在使用GCC.

Is there any other way I can implement this? I am using GCC.

如果没有,我可以自动重置64位的最大类型是什么?

If not, what is the largest type I can reset atomically- 64 bits?

任何AVX指令都可以原子执行提取和操作吗?

Could any AVX instructions perform the fetch-AND atomically?

推荐答案

所以有一些不同的事情需要解决:

So there are a few different things that need to be solved:

  1. 处理器可以做什么?
  2. 原子是什么意思?
  3. 您可以让编译器生成处理器可以执行的代码吗?
  4. C ++ 11/14标准支持吗?

对于#1和#2:

在x86中,有执行8、16、32、64、128、256和512位操作的指令.一个处理器将[至少在数据与其自身大小对齐的情况下]原子地执行该操作.但是,要使操作成为真正的原子",它还需要防止在该数据更新内出现竞争状况(换句话说,要防止其他处理器读取,修改和写回同一位置).除了少量的隐式锁定"指令之外,还可以通过在特定指令上添加锁定前缀"来完成此操作-这将对系统中的其他处理器执行正确的缓存对话[技术术语],以确保仅此处理器可以更新此数据.

In x86, there are instructions to do 8, 16, 32, 64, 128, 256 and 512 bit operations. One processor will [at least if the data is aligned to it's own size] perform that operation atomically. However, for an operation to be "true atomic", it also needs to prevent race conditions within the update of that data [in other words, prevent some other processor from reading, modifying and writing back that same location]. Aside from a small number of "implied lock" instructions, this is done by adding a "lock prefix" to a particular instruction - this will perform the right kind of cache-talk [technical term] to the other processors in the system to ensure that ONLY THIS processor can update this data.

我们不能使用带LOCK前缀的VEX指令(来自英特尔手册)

We can't use VEX instructions with LOCK prefix (from Intel's manual)

任何在VEX之前带有LOCK前缀的VEX编码指令都将#UD

Any VEX-encoded instruction with a LOCK prefix preceding VEX will #UD

您需要使用VEX前缀才能使用AVX指令,并且#UD表示未定义指令"-换句话说,如果我们尝试执行该代码,则会导致处理器异常.

You need a VEX prefix to use AVX instructions, and #UD means "undefined instruction" - in other words, the code will cause a processor exception if we try to execute it.

因此,可以100%确定处理器不能一次对256位执行原子操作.这个答案讨论了SSE指令的原子性: SSE指令:哪些CPU可以执行原子16B内存操作?

So, it is 100% certain that the processor can not do an atomic operation on 256 bits at a time. This answer discusses SSE instruction atomicity: SSE instructions: which CPUs can do atomic 16B memory operations?

#3如果指令无效,则毫无意义.

#3 is pretty meaningless if the instruction isn't valid.

#4-好吧,该标准支持std::atomic<uintmax_t>,如果uintmax_t恰好是128或256位,那么您当然可以这样做.我不知道有任何处理器支持uintmax_t的128位或更高位,但是该语言并不能阻止它.

#4 - well, the standard supports std::atomic<uintmax_t>, and if uintmax_t happens to be 128 or 256 bits, then you could certainly do that. I'm not aware of any processor supporting 128 or higher bits for uintmax_t, but the language doesn't prevent it.

如果对原子"的要求不如需要确保100%确保没有其他处理器同时更新"的要求那么强,则使用常规的SSE,AVX或AVX512指令就足够了-但是如果您有两个处理器(内核)同时在同一位内存上执行读/修改/写操作,则为竞争条件.

If the requirement for "atomic" isn't as strong as "need to ensure 100% certainly that no other processor updates this at the same time", then using regular SSE, AVX or AVX512 instructions would suffice - but there will be race conditions if you have two processor(cores) doing read/modify/write operations on the same bit of memory simultaneously.

x86上最大的原子操作是CMPXCHG16B,如果另外两个寄存器中的值与内存中的值匹配,它将用内存中的内容交换两个64位整数寄存器.因此,您可能会想出一些方法来读取一个128位的值,然后取出一些位,然后如果没有其他东西首先进入该值,则以原子方式将新值存储回去-如果发生这种情况,则必须重复该操作,当然,它也不是单个原子和运算.

The largest atomic operation on x86 is CMPXCHG16B, which will swap two 64-bit integer registers with the content in memory if the value in two other registers MATCH the value in memory. So you could come up with something that reads one 128-bit value, ands out some bits, and then stores the new value back atomically if nothing else got in there first - if that happened, you have to repeat the operation, and of course, it's not a single atomic and-operation either.

当然,在除Intel和AMD之外的其他平台上,行为可能有所不同.

Of course, on other platforms than Intel and AMD, the behaviour may be different.

这篇关于可以原子获取和处理的最大数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆