为什么要锁定线程? [英] Why should you lock threads?

查看:109
本文介绍了为什么要锁定线程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了很多有关锁定线程的示例.但是为什么要锁定它们呢? 据我了解,当您在不加入线程的情况下启动线程时,它们将与主线程和所有其他线程竞争资源,然后执行,有时同时执行,有时则不执行.

锁定是否确保线程不会同时执行?

另外,同时执行线程有什么问题?那不是更好吗? (更快的整体执行速度)

锁定线程时,它会锁定所有线程还是可以选择要锁定的线程? (实际上任何锁定都可以...)

我指的是使用诸如lock()之类的锁定函数并在线程模块btw中获取...

解决方案

通过锁,您可以强制多个线程一次访问一个资源,而不是所有线程都试图同时访问该资源.

您注意到,通常您确实希望线程同时执行.但是,假设您有两个线程,并且两个线程都正在写入同一文件.如果他们尝试同时写入同一文件,则它们的输出将混杂在一起,并且实际上没有一个线程能够成功地将所需的内容放入文件中.

现在也许这个问题不会一直出现.大多数情况下,线程不会尝试一次全部写入文件.但是有时,它们可能每千次运行一次.因此,也许您的bug似乎随机发生,并且难以复制,因此很难修复. gh!

或者...这可能是在我供职的一家公司中发生的...您有这样的错误,但是不知道它们在那里,因为如果您的计算机只有几个CPU,并且几乎没有,那么它们很少出现您的任何客户都有4个以上.然后他们都开始购买16个CPU的盒子...并且您的软件运行的线程数与CPU内核数一样,因此突然您崩溃了很多,或者得到了错误的结果.

所以无论如何,回到文件.为防止线程彼此踩踏,每个线程在写入文件之前必须获得文件上的锁.一次只能有一个线程持有该锁,因此一次只能有一个线程写入文件.该线程将保持该锁,直到完成向文件的写操作为止,然后释放该锁,以便另一个线程可以使用该文件.

如果线程正在写入其他文件,则永远不会出现此问题.因此,这是一种解决方案:让线程将其写入不同的文件,然后在必要时将它们合并.但这并不总是可能的.有时候,只有一种.

它不一定是文件.假设您试图简单地计算一堆不同文件中字母"A"的出现次数,每个文件一个线程.您认为,很明显,我将让所有线程在每次看到"A"时都增加相同的内存位置.但!当您增加用于保持计数的变量时,计算机会将变量读入寄存器,对寄存器进行递增,然后将值存储回去.如果两个线程同时读取值,同时增加值并同时存储回去怎么办?他们俩都从10开始,将其增加到11,然后再存储11.因此计数器的11点应该是12点:您输了一个数.

获取锁可能会很昂贵,因为您必须等到其他任何正在使用该资源的人都使用完它.这就是为什么Python的Global Interpreter Lock是性能瓶颈的原因.因此,您可能决定完全避免使用共享资源.每个线程不使用单个内存位置来保存文件中的"A"数,而是保留自己的计数,然后将它们全部累加起来(很有趣,类似于我对文件建议的解决方案)

I've read a lot of examples on locking threads.. but why should you lock them? From my understanding, when you initiate threads without joining them, they will compete with the main thread and all other threads for resources and then execute, sometimes simultaneously, sometimes not.

Does locking ensure that threads DON'T execute simultaneously?

Also, what wrong with threads executing simultaneous? Isn't that even better? (faster overall execution)

When you lock threads, will it lock them all or can you choose which ones you want to be locked? (Whatever locking actually does...)

I'm referring to using the lock functions like lock() and acquire in the threading module btw...

解决方案

A lock allows you to force multiple threads to access a resource one at a time, rather than all of them trying to access the resource simultaneously.

As you note, usually you do want threads to execute simultaneously. However, imagine that you have two threads and they are both writing to the same file. If they try to write to the same file at the same time, their output is going to get intermingled and neither thread will actually succeed in putting into the file what it wanted to.

Now maybe this problem won't come up all the time. Most of the time, the threads won't try to write to the file all at once. But sometimes, maybe once in a thousand runs, they do. So maybe you have a bug that occurs seemingly at random and is hard to reproduce and therefore hard to fix. Ugh!

Or maybe... and this has happened at a company I worked for... you have such bugs but don't know they're there because they are extremely infrequent if your computer has only a few CPUs, and hardly any of your customers have more than 4. Then they all start buying 16-CPU boxes... and your software runs as many threads as there are CPU cores, so suddenly you're crashing a lot or getting the wrong results.

So anyway, back to the file. To prevent the the threads from stepping on each other, each thread must acquire a lock on the file before writing to it. Only one thread can hold the lock at a time, so only one thread can write to the file at a time. The thread holds the lock until it is done writing to the file, then releases the lock so another thread can use the file.

If the threads are writing to different files, this problem never arises. So that's one solution: have your threads write to different files, and combine them afterward if necessary. But this isn't always possible; sometimes, there's only one of something.

It doesn't have to be files. Suppose you are trying to simply count the number of occurrences of the letter "A" in a bunch of different files, one thread per file. You think, well, obviously, I'll just have all the threads increment the same memory location each time they see an "A." But! When you go to increment the variable that's keeping the count, the computer reads the variable into a register, increments the register, and then stores the value back out. What if two threads read the value at the same time, increment it at the same time, and store it back at the same time? They both start at, say, 10, increment it to 11, store 11 back. So the counter's 11 when it should be 12: you have lost one count.

Acquiring locks can be expensive, since you have to wait until whoever else is using the resource is done with it. This is why Python's Global Interpreter Lock is a performance bottleneck. So you may decide to avoid using shared resources at all. Instead of using a single memory location to hold the number of "A"s in your files, each thread keeps its own count, and you add them all up at the end (similar to the solution I suggested with the files, funnily enough).

这篇关于为什么要锁定线程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆