C ++ Treiber Stack和原子下一指针 [英] C++ Treiber Stack and atomic next pointers

查看:64
本文介绍了C ++ Treiber Stack和原子下一指针的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

" Treiber Stack "通常是其中之一最简单的无锁数据结构,因此在教授无锁算法入门时经常使用它.

The "Treiber Stack" is generally one of the simplest lock-free data structures, and so it is often used when teaching introductions to lock-free algorithms.

我已经看到许多使用C ++原子的Treiber Stacks实现.该算法本身是微不足道的,所以真正的挑战是处理无锁数据结构的所有其他附带细节,例如提供某种方式来执行安全的内存回收,避免ABA问题以及以无锁的方式分配节点.这可以通过多种方式解决,例如使用原子引用计数,危险指针,避免ABA的计数/标记指针以及使用无锁内存池.

I've seen many implementations of Treiber Stacks using C++ atomics. The algorithm itself is trivial, so the real challenge is handling all the other incidental details of lock-free data-structures, such as providing some way of performing safe memory reclamation, avoiding the ABA problem, and allocating nodes in a lock-free manner. This can be solved in various ways, such as using atomic reference counting, hazard pointers, counted/tagged pointers to avoid ABA, and using a lock-free memory pool.

但是忽略了所有这些细节,只关注简单的算法本身,我想到的一个问题是,我可以回忆起的Treiber Stacks的每种实现都使用 atomic next指针实现了节点类..例如:

But ignoring all those details and focusing on the simple algorithm itself, one question that occurred to me is the fact that every implementation of Treiber Stacks that I can recall implements the node class using atomic next pointers. For example:

struct Node
{
  T value;
  std::atomic<Node*> next;
};

但是在考虑了算法之后,我不确定为什么下一个指针需要是原子的.

But after thinking about the algorithm, I'm not sure why the next pointer needs to be atomic.

一般的PUSH算法(忽略无锁分配,安全内存回收,退避,避免ABA等)是:

The general PUSH algorithm (ignoring lock-free allocation, safe memory reclamation, backoff, ABA avoidance, etc.) is:

Node* n = new Node();
Node* front = m_front.load();
n->next.store(front);
while (!m_front.compare_exchange_weak(front, n))
{
  n->next.store(front);
}

一般的POP算法(同样,忽略实际算法逻辑之外的所有细节)是:

The general POP algorithm (again, ignoring all details except the actual algorithmic logic) is:

Node* front = m_front.load();
Node* next = front->next.load();
while (!m_front.compare_exchange_weak(front, next))
{
  next = front->next.load();
}

这是PUSH算法的真实示例实现:

And here is an real-world example implementation of the PUSH algorithm:

https://github.com/khizmax/libcds/blob/master/cds/intrusive/treiber_stack.h#L736

所以我不明白为什么为什么下一个指针甚至需要是原子的.大多数C ++实现使用 next 指针使用轻松的加载/存储,因此在读取/写入下一个指针时我们不需要任何内存隔离,但是我认为它不需要原子的.

So I don't understand why the next pointer even needs to be atomic. Most C++ implementations use relaxed loads/stores with the next pointer, so we don't need any memory fences when reading/writing to the next pointer, but my thinking is that it doesn't need to be atomic at all.

据我所见,任何时候都不会同时写入任何节点的下一个指针.相反,下一个指针可能会同时 loading ,但是我从未见过该算法能够同时加载+存储或同时存储+存储的机会.实际上,在PUSH算法中,根本不会同时访问下一个指针.

From what I can see, at no time is the next pointer of any node concurrently written to. Rather, the next pointer may be concurrently loaded, but I never see any opportunity for the algorithm to concurrently load+store or concurrently store+store. In fact, in the PUSH algorithm, the next pointer is never accessed concurrently at all.

所以在我看来,并发访问时,下一个指针实际上是只读"的,所以我不确定为什么甚至有必要使它们成为原子的.

So it seems to me that next pointers are effectively "read-only" when accessed concurrently, so I'm not sure why it would even be necessary to make them atomic at all.

但是,我见过的Treiber Stack的每个C ++实现都会使下一个指针成为原子指针.所以我是对的,还是出于某些原因必须使下一个指针成为原子指针?

Yet, every C++ implementation of a Treiber Stack I've seen makes the next pointers to be atomic. So am I correct, or is there some reason the next pointer must be made atomic?

推荐答案

如果它和显示的代码一样简单,那将是对的.在发布 Node 的指针之后,再也不会对其进行修改.但是您省略了清理节点的部分,以便可以对其进行垃圾收集.(在弹出后,您不能只是 delete ;另一个线程可能仍具有指向它的指针,但尚未读取它.这对于RCU也是一个棘手的问题.)

If it was as simple as the code you showed, you'd be right. A Node is never modified after a pointer to it is published. But you left out the part where Nodes are cleaned up so they can be garbage-collected. (You can't just delete after popping; another thread could still have a pointer to it and not yet have read it. This is a tricky problem for RCU as well.)

这是您遗漏的功能,在 pop 中的CAS成功后调用:

This is the function you left out, called after a CAS in pop succeeds:

protected:
    void clear_links( node_type * pNode ) CDS_NOEXCEPT
    {
        pNode->m_pNext.store( nullptr, memory_model::memory_order_relaxed );
    }

这是一个顺序,在此顺序中,阅读器在编写时会读取 next :

Here's an ordering where a reader reads a next while it's being written:

A: Node* front = m_front.load();
                                 B: Node* front = m_front.load();  // same value
A: Node* next = front->next.load();
A: m_front.compare_exchange_weak(front, next) // succeeds, no loop
A: clear_links(front);  // i.e. front->next.store(nullptr);

                                 B: front->next.load();

因此,C ++未定义的行为,就标准合规性而言,故事的结尾.

Thus, C++ Undefined Behaviour, end of story as far as standards compliance is concerned.

在实践中,非原子负载

In practice, a non-atomic load will happen to be atomic anyway on most CPU architectures, or at worst experience tearing. (IDK of any ISA where it results in anything unpredictable other than the value, but C++ leaves this option open).

我不确定在任何情况下撕裂的值实际上可能会被使用(放入 m_front ),因为 clear_links()在CAS成功后才能运行.如果CAS在一个线程中成功执行,它将在另一个线程中失败,因为它只会尝试使用旧的 front 作为预期的 front 撕裂的 next 值.code> arg到CAS.

I'm not sure there's any scenario where a torn value could actually get used (put into m_front), because clear_links() can't run until after a successful CAS. And if CAS succeeded in one thread, it will fail in the other thread because it will only try a torn next value with the old front as the expected arg to CAS.

实际上,几乎每个人关心的每个实现都不会为原子放松的负载/存储增加额外的开销,而对于指针大小的对象来说则没有常规的开销.实际上,如果原子对于指针不是免费的" .

In practice, pretty much every implementation anyone cares about has no extra cost for relaxed-atomic loads/stores vs. regular for pointer-sized objects. In fact, this stack pretty much sucks if atomicity isn't "free" for a pointer.

例如在AVR(使用16位指针的8位RISC微控制器)上,仅对数据结构进行锁定会比较便宜,而不是让 std :: atomic 在每次加载/存储在此算法中.(特别是因为没有多核AVR CPU,因此实现锁很便宜.)

e.g. on AVR (8-bit RISC microcontrollers which use 16-bit pointers), it would be cheaper to just take a lock on the data structure, instead of letting std::atomic use locks for every load/store in this algo. (Especially since there aren't multi-core AVR CPUs, so locks are probably very cheap to implement.)

atomic<> 还使编译器假定某个值可以由另一个线程异步修改.因此,它避免了优化负载或存储的麻烦,有点像 volatile .(但另请参见为什么编译器不合并多余的std :: atomic写操作?/a>.)我认为这里不需要任何东西,而且不会发生.

atomic<> also gets the compiler to assume that a value could be asynchronously modified by another thread. So it stops it from ever optimizing away a load or store, somewhat like volatile. (But see also Why don't compilers merge redundant std::atomic writes?.) I don't think there's anything here where that's needed, and wouldn't already happen.

非原子操作按原子获取和释放操作排序,类似于宽松的原子操作,并且CAS修改 front ,因此front-> next 具有新的 front`因此,非原子负载无法优化.

Non-atomic ops are ordered by atomic acquire and release operations, similar to relaxed atomic operations, and CAS modifies front, so front->nexthas a newfront` so a non-atomic load couldn't optimize away.

看看在替换 atomic< Node *>之后是否从编译器获得相同的asm输出,这可能是一个有趣的实验.next Node * next .(或者使用 non_atomic 包装类,该包装类仍然具有加载/存储成员函数,因此您无需修改​​太多代码.)

It might be an interesting experiment to see if you get identical asm output from a compiler after replacing atomic <Node*> next with Node *next. (Or with a non_atomic wrapper class that still has load/store member functions so you don't have to modify much code).

使用轻松的原子存储对我来说看起来不错.您肯定不想用显示的方式实现它,并使用 seq_cst 存储作为初始化尚未发布任何指针的新对象的一部分.那时不需要原子性,但是原子性是免费的(在普通CPU上),因此避免原子性没有任何好处.没有任何商店或货物可以被优化.

Using relaxed atomic stores looks good to me. You definitely don't want to implement it the way you showed, with seq_cst stores as part of initializing a new object that hasn't had any pointers to it published yet. At that point atomicity is not needed, but it's free (on normal CPUs) so there's no benefit to avoiding it. None of the stores or loads could be optimized away.

这篇关于C ++ Treiber Stack和原子下一指针的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆