如何在C ++中混合原子和非原子操作? [英] How to mix atomic and non-atomic operations in C++?

查看:92
本文介绍了如何在C ++中混合原子和非原子操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

std :: atomic类型允许对变量进行原子访问,但有时我会 例如非原子访问,例如当访问受互斥锁保护时. 考虑一个位域类,它允许两个多线程访问(通过插入) 和单线程矢量化访问(通过operator | =):

The std::atomic types allow atomic access to variables, but I would sometimes like non-atomic access, for example when the access is protected by a mutex. Consider a bitfield class that allows both multi-threaded access (via insert) and single-threaded vectorized access (via operator|=):

class Bitfield
{
    const size_t size_, word_count_;
    std::atomic<size_t> * words_;
    std::mutex mutex_;

public:

    Bitfield (size_t size) :
        size_(size),
        word_count_((size + 8 * sizeof(size_t) - 1) / (8 * sizeof(size_t)))
    {
        // make sure words are 32-byte aligned
        posix_memalign(&words_, 32, word_count_ * sizeof(size_t));
        for (int i = 0; i < word_count_; ++i) {
            new(words_ + i) std::atomic<size_t>(0);
        }
    }
    ~Bitfield () { free(words_); }

private:
    void insert_one (size_t pos)
    {
        size_t mask = size_t(1) << (pos % (8 * sizeof(size_t)));
        std::atomic<size_t> * word = words_ + pos / (8 * sizeof(size_t));
        word->fetch_or(mask, std::memory_order_relaxed);
    }
public:
    void insert (const std::set<size_t> & items)
    {
        std::lock_guard<std::mutex> lock(mutex_);
        // do some sort of muti-threaded insert, with TBB or #pragma omp
        parallel_foreach(items.begin(), items.end(), insert_one);
    }

    void operator |= (const Bitfield & other)
    {
        assert(other.size_ == size_);
        std::unique_lock<std::mutex> lock1(mutex_, defer_lock);
        std::unique_lock<std::mutex> lock2(other.mutex_, defer_lock);
        std::lock(lock1, lock2); // edited to lock other_.mutex_ as well
        // allow gcc to autovectorize (256 bits at once with AVX)
        static_assert(sizeof(size_t) == sizeof(std::atomic<size_t>), "fail");
        size_t * __restrict__ words = reinterpret_cast<size_t *>(words_);
        const size_t * __restrict__ other_words
            = reinterpret_cast<const size_t *>(other.words_);
        for (size_t i = 0, end = word_count_; i < end; ++i) {
            words[i] |= other_words[i];
        }
    }
};

注意operator | =非常接近我的真实代码,但是insert(std :: set)是 只是试图抓住人们可以做到的想法

Note operator|= is very close to what's in my real code, but insert(std::set) is just attempting to capture the idea that one can

acquire lock;
make many atomic accesses in parallel;
release lock;

我的问题是:混合原子和非原子的最佳方法是什么 使用权?以下对[1,2]的回答表明,投射是错误的(我同意).但是可以肯定,该标准允许这种看似安全的访问吗?

My question is this: what is the best way to mix such atomic and non-atomic access? Answers to [1,2] below suggest that casting is wrong (and I agree). But surely the standard allows such apparently safe access?

更一般而言,可以使用读取器-写入器锁并允许读取器"锁定.原子地读写,唯一的"writer"非原子地阅读和书写?

More generally, can one use a reader-writer-lock and allow "readers" to read and write atomically, and the unique "writer" to read and write non-atomically?

  1. 如何有效使用std :: atomic
  2. 访问atomic< int> C ++ 0x作为非原子的
  1. How to use std::atomic efficiently
  2. Accessing atomic<int> of C++0x as non-atomic

推荐答案

C ++ 11之前的标准C ++没有多线程内存模型.我认为标准中没有定义非原子访问的内存模型的更改,因此这些更改与在C ++ 11之前的环境中获得类似的保证.

Standard C++ prior to C++11 had no multithreaded memory model. I see no changes in the standard that would define the memory model for non-atomic accesses, so those get similar guarantees as in a pre-C++11 environment.

从理论上讲,它实际上比使用memory_order_relaxed更糟糕,因为非原子访问的跨线程行为完全是未定义的,与多个可能必须执行的执行顺序相反.

It is actually theoretically even worse than using memory_order_relaxed, because the cross thread behavior of non-atomic accesses is simply completely undefined as opposed to multiple possible orders of execution one of which must eventually happen.

因此,要在混合原子访问和非原子访问的同时实现这种模式,您仍将不得不依赖于平台特定的非标准构造(例如,_ReadBarrier)和/或对特定硬件的深入了解.

So, to implement such patterns while mixing atomic and non-atomic accesses, you will still have to rely on platform specific non-standard constructs (for example, _ReadBarrier) and/or intimate knowledge of particular hardware.

一个更好的选择是熟悉memory_order枚举,并希望通过给定的代码和编译器来实现最佳的汇编输出.最终结果可能是正确的,可移植的,并且没有多余的内存屏障,但是如果您像我一样,则应该期望首先分解并分析多个错误版本.并且仍然不能保证在所有代码路径上使用原子访问不会在不同的体系结构或不同的编译器上产生一些多余的篱笆.

A better alternative is to get familiar with the memory_order enum and hope to achieve optimum assembly output with a given piece of code and compiler. The end result may be correct, portable, and contain no unwanted memory fences, but you should expect to disassemble and analyze several buggy versions first, if you are like me; and there will still be no guarantee that the use of atomic accesses on all code paths will not result in some superfluous fences on a different architecture or a different compiler.

因此,最佳的实际答案是首先简单.设计跨线程交互时要使其尽可能简单,而又不会完全破坏可伸缩性,响应能力或其他任何神圣的东西.几乎没有共享的可变数据结构;并始终以原子方式尽可能少地访问它们.

So the best practical answer is simplicity first. Design your cross-thread interactions as simple as you can make it without completely killing scalability, responsiveness or any other holy cow; have nearly no shared mutable data structures; and access them as rarely as you can, always atomically.

这篇关于如何在C ++中混合原子和非原子操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆