在汇编语言层面如何实现线程同步? [英] How is thread synchronization implemented, at the assembly language level?

查看:870
本文介绍了在汇编语言层面如何实现线程同步?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然我熟悉并发编程概念,例如互斥和信号量,但我从来没有理解它们是如何在汇编语言层实现的。

While I'm familiar with concurrent programming concepts such as mutexes and semaphores, I have never understood how they are implemented at the assembly language level.

是一组记忆标志:


  • 锁A由线程1持有

  • 锁B由线程3持有

  • 锁C不由任何线程持有


  • lock A is held by thread 1
  • lock B is held by thread 3
  • lock C is not held by any thread
  • etc

但是如何访问这些标志在线程之间同步?像这个天真的例子只会创建一个竞争条件:

But how is access to these flags synchronized between threads? Something like this naive example would only create a race condition:

  mov edx, [myThreadId]
wait:
  cmp [lock], 0
  jne wait
  mov [lock], edx
  ; I wanted an exclusive lock but the above 
  ; three instructions are not an atomic operation :(


推荐答案

$ b

  • 在实践中,这些趋向于通过 CAS 实施, LL / SC
    (...和一些在放弃线程的时间片之前旋转 - 通常通过调用切换上下文的内核函数。)

  • 如果你只需要一个 spinlock ,维基百科给你一个例子,在x86 / x64上以CAS为锁前缀 xchg 。所以在严格意义上,不需要CAS来创建自旋锁 - 但是仍然需要某种原子性。在这种情况下,它使用可以向寄存器写入寄存器并返回先前内容的原子操作该单个步骤中的内存插槽。 (为了更清楚一点: lock 前缀断言#LOCK信号,确保当前CPU可以独占访问内存在今天的CPU上不一定这样,但效果通过使用 xchg ,我们确保我们不会在读写之间的某个地方被抢占,因为指令不会被中途中断,所以如果我们有一个false lock mov reg0,mem / lock mov mem,reg1 对(我们不这样做),它不会完全相同 - 它可以在两个mov之间被抢占。)

  • 在当前的体系结构中,如注释中所指出的,你通常最终使用CPU的原子基元和内存子系统提供的一致性协议。

  • 因此,您不仅必须使用这些原语,而且还要考虑架构保证的缓存/内存一致性。

  • 也可能存在实现细节。考虑例如一个自旋锁:


    • 而不是一个朴素的实现,你应该使用例如。 a TTAS自旋锁与一些指数退缩

    • 在超线程CPU上,您应该可以发出 pause 指令,作为您正在旋转的提示 - 以便您正在运行的核心可以在这

    • 之前做些有用的事情。

    • etc ...

      • In practice, these tend to be implemented with CAS and LL/SC. (...and some spinning before giving up the time slice of the thread - usually by calling into a kernel function that switches context.)
      • If you only need a spinlock, wikipedia gives you an example which trades CAS for lock prefixed xchg on x86/x64. So in a strict sense, a CAS is not needed for crafting a spinlock - but some kind of atomicity is still required. In this case, it makes use of an atomic operation that can write a register to memory and return the previous contents of that memory slot in a single step. (To clarify a bit more: the lock prefix asserts the #LOCK signal that ensures that the current CPU has exclusive access to the memory. On todays CPUs it is not necessarily carried out this way, but the effect is the same. By using xchg we make sure that we will not get preempted somewhere between reading and writing, since instructions will not be interrupted half-way. So if we had an imaginary lock mov reg0, mem / lock mov mem, reg1 pair (which we don't), that would not quite be the same - it could be preempted just between the two movs.)
      • On current architectures, as pointed out in the comments, you mostly end up using the atomic primitives of the CPU and the coherency protocols provided by the memory subsystem.
      • For this reason, you not only have to use these primitives, but also account for the cache/memory coherency guaranteed by the architecture.
      • There may be implementation nuances as well. Considering e.g. a spinlock:
        • instead of a naive implementation, you should probably use e.g. a TTAS spin-lock with some exponential backoff,
        • on a Hyper-Threaded CPU, you should probably issue pause instructions that serve as hints that you're spinning - so that the core you are running on can do something useful during this
        • you should really give up on spinning and yield control to other threads after a while
        • etc...
        • 这篇关于在汇编语言层面如何实现线程同步?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆