在汇编语言级别如何实现线程同步? [英] How is thread synchronization implemented, at the assembly language level?

查看:11
本文介绍了在汇编语言级别如何实现线程同步?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然我熟悉诸如互斥体和信号量之类的并发编程概念,但我从未理解它们是如何在汇编语言级别实现的.

While I'm familiar with concurrent programming concepts such as mutexes and semaphores, I have never understood how they are implemented at the assembly language level.

我想象有一组内存标志"说:

I imagine there being a set of memory "flags" saying:

  • 锁 A 由线程 1 持有
  • 锁 B 由线程 3 持有
  • 锁 C 不被任何线程持有

但是如何在线程之间同步访问这些标志?像这个天真的例子只会产生竞争条件:

But how is access to these flags synchronized between threads? Something like this naive example would only create a race condition:

  mov edx, [myThreadId]
wait:
  cmp [lock], 0
  jne wait
  mov [lock], edx
  ; I wanted an exclusive lock but the above 
  ; three instructions are not an atomic operation :(

推荐答案

  • 在实践中,这些往往是通过 CASLL/SC.(...在放弃线程的时间片之前进行一些旋转 - 通常通过调用切换上下文的内核函数.)
  • 如果您只需要一个 spinlock,维基百科为您提供了一个示例,该示例将 CAS 交换为带前缀的锁xchg 在 x86/x64 上.所以从严格意义上讲,制作自旋锁不需要 CAS——但仍然需要某种原子性.在这种情况下,它利用原子操作将寄存器写入内存并在单个步骤中返回该内存槽的先前内容.(再澄清一点:lock 前缀断言 #LOCK 信号,确保当前 CPU 对内存具有独占访问权.在今天的 CPU 上,它不一定以这种方式执行,但效果是一样的.通过使用 xchg 我们确保我们不会在读取和写入之间被抢占,因为指令不会被中途中断.所以如果我们有一个假想的 lock movreg0, mem/lock mov mem, reg1 对(我们不这样做),那不会完全相同 - 它可以在两个 movs 之间被抢占.)
  • 在当前的架构中,正如评论中所指出的,您最终大多使用 CPU 的原子原语和内存子系统提供的一致性协议.
  • 因此,您不仅要使用这些原语,还要考虑架构保证的缓存/内存一致性.
  • 在实现上也可能存在细微差别.考虑例如自旋锁:
    • 而不是一个幼稚的实现,你可能应该使用例如一个 TTAS 自旋锁,带有一些指数退避
    • 在超线程 CPU 上,您可能应该发出 pause 指令,作为您正在旋转的提示 - 这样您正在运行的内核可以在此期间做一些有用的事情
    • 你真的应该在一段时间后放弃自旋并将控制权交给其他线程
    • 等等...
      • In practice, these tend to be implemented with CAS and LL/SC. (...and some spinning before giving up the time slice of the thread - usually by calling into a kernel function that switches context.)
      • If you only need a spinlock, wikipedia gives you an example which trades CAS for lock prefixed xchg on x86/x64. So in a strict sense, a CAS is not needed for crafting a spinlock - but some kind of atomicity is still required. In this case, it makes use of an atomic operation that can write a register to memory and return the previous contents of that memory slot in a single step. (To clarify a bit more: the lock prefix asserts the #LOCK signal that ensures that the current CPU has exclusive access to the memory. On todays CPUs it is not necessarily carried out this way, but the effect is the same. By using xchg we make sure that we will not get preempted somewhere between reading and writing, since instructions will not be interrupted half-way. So if we had an imaginary lock mov reg0, mem / lock mov mem, reg1 pair (which we don't), that would not quite be the same - it could be preempted just between the two movs.)
      • On current architectures, as pointed out in the comments, you mostly end up using the atomic primitives of the CPU and the coherency protocols provided by the memory subsystem.
      • For this reason, you not only have to use these primitives, but also account for the cache/memory coherency guaranteed by the architecture.
      • There may be implementation nuances as well. Considering e.g. a spinlock:
        • instead of a naive implementation, you should probably use e.g. a TTAS spin-lock with some exponential backoff,
        • on a Hyper-Threaded CPU, you should probably issue pause instructions that serve as hints that you're spinning - so that the core you are running on can do something useful during this
        • you should really give up on spinning and yield control to other threads after a while
        • etc...
        • 这篇关于在汇编语言级别如何实现线程同步?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆