无锁引用计数和C ++智能指针 [英] Lock-free Reference counting and C++ smart pointers

查看:194
本文介绍了无锁引用计数和C ++智能指针的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常,C ++中引用计数智能ptr类的最广为人知的实现,包括标准 std :: shared_ptr ,使用原子引用计数,但不提供原子访问同一个智能ptr实例。换句话说,多个线程可以安全地在指向同一共享对象的单独的 shared_ptr 实例上操作,但是多个线程不能安全地读/写实例 $ shared_ptr 的实例,而不提供某种类型的同步,例如互斥或任何。



原子版本a shared_ptr 调用 atomic_shared_ptr 已经 提出 和初步实现已经存在。推测, atomic_shared_ptr 可以很容易地用自旋锁或互斥锁实现,但是无锁实现也是可能的。



在研究了一些这些实现之后,有一点很明显:实现一个无锁的 std :: shared_ptr 是非常困难的,似乎需要这么多 compare_and_exchange 操作,使我怀疑一个简单的自旋锁是否会达到更好的性能。



这是很难实现的主要原因无锁引用计数指针是因为在读取共享控制块(或共享对象本身,如果我们正在谈论一个侵入式共享指针)之间始终存在的竞争,以及修改引用计数。



换句话说,你甚至不能安全地读引用计数,因为你永远不知道什么时候有其他线程释放引用计数存在的内存。



因此,一般来说,使用各种复杂的策略来创建无锁版本。此处的实施方式看起来像是使用双引用计数策略,其中存在计数并发访问 shared_ptr 实例的线程数的本地引用,然后计算指向的shared_ptr实例的数量的共享或全局引用到共享对象。



鉴于所有这些复杂性,我非常惊讶地发现Dobbs博士的文章,从 2004 不少(在C ++ 11之前原子论),似乎无法解决这个整个问题:



http://www.drdobbs.com/atomic-reference-counting-pointers/184401888



看起来作者声称不知何故能够:



$ b - 所有这样的方式,没有其他线程可以
导致不正确的结果


但我真的不明白他实际实现这种方式的方式。他使用(非便携式)PowerPC指令(LL / SC原语 lwarx stwcx )将其关闭。



这样做的相关代码是他所谓的 aIandF (原子增量和提取)他定义为:

  addr aIandF(addr r1){
addr tmp; int c;
do {
do {
tmp = * r1;
if(!tmp)break;
c = lwarx(tmp);
} while(tmp!= * r1);
} while(tmp&&&!stwcx(tmp,c + 1));
return tmp;
};

显然, addr 到拥有引用计数变量的共享对象。



我的问题是:这是唯一可能的结构支持LL / SC操作?看起来这是不可能做到 cmpxchg 。其次,这是如何工作的?我已经阅读了这段代码几次,现在,我不能真正了解发生了什么。我了解 LL / SC 基元的作用,我只是可以'



我最好理解的是, addr r1 指向共享对象的指针,以及 指向引用计数的指针的地址(我猜这意味着引用计数变量需要是 struct 定义共享对象)。然后他取消引用 addr (获取共享对象的实际地址)。然后,他链接加载存储在 tmp 中的地址的值,并将结果存储在 c 中。这是计数器值。然后他有条件地将增加的值(如果 tmp 已更改)失败,则返回 tmp

我不明白是如何工作的。共享对象的地址可能永远不会改变,并且LL / SC可以成功 - 但是如果另一个线程在同一时间释放了共享对象,如何帮助我们?

解决方案

  addr aIandF(addr r1){
addr tmp; int c;
do {
do {
// r1保存地址的地址
//引用的地址
tmp = * r1; //获取引用的地址
if(!tmp)break; //如果为null,保护

//读取当前引用
//并获取保留
c = lwarx(tmp);

//现在我们保留预留,
//检查是否有另一个线程
//改变了共享块地址
} while(tmp!= * r1); //如果是这样,从

开始//如果存储成功,我们知道我们持有
//整个
}的预留,而(tmp&&!stwcx ,c + 1));
return tmp;
};

请注意, aIandF 现有共享指针的副本,声明该副本的引用。



Dr Dobbs文章描述了释放引用的操作,首先使用本地函数的空指针在源共享指针对象中原子地交换共享计数器的地址;然后原子地递减计数器;然后测试以查看递减的结果是否为零。这个操作顺序很重要:你说,共享对象的地址可能永远不会改变,LL / SC可以成功 - 但是如果另一个线程同时释放了共享对象,这又如何帮助我们呢? - 但是这绝对不会发生,因为对象将永远不会被释放,没有交换首先发生,给我们一种观察地址变化的方法。



aIandF 在输入时测试计数器地址为空。



如果发生在 lwarx 之前,它可以发现地址变为空值,因为它会明确地测试一次有预订。



如果在递减线程中交换发生在lwarx之后,我们实际上并不关心:如果 stwcx code> aIandF 成功,我们知道递减线程将看到新的引用计数,而不是销毁对象,我们可以继续知道我们已经声明了一个引用;而如果其他线程成功地递减计数器,我们将失去我们的保留,存储将失败,我们将在下一次循环迭代中检测到对象的销毁。



此算法假设一个强一致的内存模型(所有线程总是看到对方的读取和写入的程序顺序的影响) - 即使在支持ll / sc的现代架构上也不一定这样。



编辑:考虑它,算法也显然假定从一次有效的存储器地址读取总是安全的(例如,没有MMU /保护;或者,算法被破坏):

  if 

//另一个线程可以在这一点上做它的交换,
//递减*和*销毁对象tmp指向
//,然后我们才能做任何事情否则

c = lwarx(tmp);

//如果发生了,我们将检测到这个事实,并且不使用c
//,但是只有当lwarx不捕获
//时内存tmp指向
//当其他线程释放对象时获取未映射)


In general, most widely known implementations of reference-counting smart ptr classes in C++, including the standard std::shared_ptr, use atomic reference counting, but do not provide atomic access to the same smart ptr instance. In other words, multiple threads may safely operate on separate shared_ptr instances which point to the same shared object, but multiple threads cannot safely read/write instances of the same shared_ptr instance without providing some kind of synchronization such as a mutex or whatever.

An atomic version of a shared_ptr called "atomic_shared_ptr" has been proposed, and preliminary implementations already exist. Presumably, atomic_shared_ptr could easily be implemented with a spin lock or mutex, but a lock-free implementation is also possible.

After studying some of these implementations, one thing is obvious: implementing a lock-free std::shared_ptr is very difficult, and seems to require so many compare_and_exchange operations as to make me question whether a simple spin lock would achieve better performance.

The main reason that it's so difficult to implement a lock-free reference-counted pointer is because of the race that always exists between reading the shared control block (or the shared object itself, if we're talking about an intrusive shared pointer), and modifying the reference count.

In other words, you can't even safely read the reference count because you never know when some other thread has deallocated the memory where the reference count lives.

So in general, various complex strategies are employed to create lock-free versions. The implementation here looks like it uses a double-reference count strategy, where there are "local" references which count the number of threads concurrently accessing the shared_ptr instance, and then "shared" or "global" references which count the number of shared_ptr instances pointing to the shared object.

Given all this complexity, I was really surprised to find a Dr. Dobbs article, from 2004 no less (way before C++11 atomics) that seems to nonchalantly solve this entire problem:

http://www.drdobbs.com/atomic-reference-counting-pointers/184401888

It looks like the author is claiming to somehow be able to:

"... [read] the pointer to the counter, increments the counter, and returns the pointer—all in such a manner that no other threads can cause an incorrect result"

But I don't really understand the way he actually implements this. He's using (non-portable) PowerPC instructions (the LL/SC primitives lwarx and stwcx) to pull this off.

The relevant code that does this is what he calls an "aIandF" (atomic increment and fetch), which he defines as:

addr aIandF(addr r1){
  addr tmp;int c;
  do{
    do{
      tmp = *r1;
      if(!tmp)break;
      c = lwarx(tmp);
    }while(tmp != *r1);
  }while(tmp && !stwcx(tmp,c+1));
  return tmp;
};

Apparently, addr is a pointer type pointing to the shared object that owns the reference count variable.

My question(s) is:, is this only possible to do with an architecture that supports LL/SC operations? It seems it would be impossible to do with cmpxchg. And secondly, how exactly does this work? I've read over this code a few times now, and I can't really understand what's going on. I understand what LL/SC primitives do, I just can't make any sense of the code.

The best I can understand is that addr r1 is the address of the pointer to the shared object, and also the address of the pointer to the reference count (which I guess means the reference count variable needs to be the first member of the struct that defines the shared object). He then dereferences addr (getting the actual address of the shared object). Then, he linked loads the value stored at the address in tmp, and stores the result in c. This is the counter value. He then conditionally stores that value incremented (which will fail if tmp has changed) back into tmp.

What I don't understand is how this works. The address of the shared object may never change and the LL/SC could succeed - but how does this help us if another thread has deallocated the shared object in the mean time?

解决方案

addr aIandF(addr r1){
  addr tmp;int c;
  do{
    do{
        // r1 holds the address of the address 
        // of the refcount
      tmp = *r1;    // grab the address of the refcount
      if(!tmp)break;// if it's null, bail

        // read current refcount
        // and acquire reservation
      c = lwarx(tmp);

        // now we hold the reservation, 
        // check to see if another thread 
        // has changed the shared block address
    }while(tmp != *r1); // if so, start over

    // if the store succeeds we know we held 
    // the reservation throughout
  }while(tmp && !stwcx(tmp,c+1));
  return tmp;
};

Note that aIandF is used specifically when constructing a copy of an existing shared pointer, claiming a reference for the copy.

The Dr Dobbs article describes the operation for releasing a reference as first atomically swapping the address of the shared counter in the source shared pointer object with a null pointer local to the function; then atomically decrementing the counter; then testing to see if the result of the decrement was zero. This order of operations is important: you say, "The address of the shared object may never change and the LL/SC could succeed - but how does this help us if another thread has deallocated the shared object in the mean time?" - but this can never happen, since the object will never be deallocated without the swap happening first, giving us a means to observe the change of address.

aIandF tests for the counter address being null on entry.

It can spot the address becoming null if that occurs before the lwarx, because it explicitly tests for this once it has the reservation.

If the swap in the decrementing thread occurs after the lwarx we do not actually care: if the stwcx in aIandF succeeds, we know the decrementing thread will see the new reference count and not destruct the object, and we can proceed knowing we have claimed a reference to it; whereas if the other thread succeeds in decrementing the counter first, we will lose our reservation, the store will fail and we'll detect the destruction of the object on the next loop iteration.

This algorithm assumes a strongly consistent memory model (all threads always see effects of each other's reads and writes in program order) - this is not necessarily the case even on those modern architectures that do support ll/sc.

EDIT: thinking about it, the algorithm also apparently makes the assumption that it is always safe to read from a memory address that was once valid (e.g., no MMU/protection; or, the algorithm is broken):

if(!tmp)break;

// another thread could, at this point, do its swap, 
// decrement *and* destroy the object tmp points to
// before we get to do anything else

c = lwarx(tmp); 

// if that happened, we'll detect this fact and do nothing with c
// but ONLY if the lwarx doesn't trap 
// (due to the memory tmp points to 
// getting unmapped when the other thread frees the object)

这篇关于无锁引用计数和C ++智能指针的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆