在Python中使用多个线程 [英] using multiple threads in Python

查看:94
本文介绍了在Python中使用多个线程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解决一个问题,其中我有许多(大约一万个)URL,并且需要从所有URL下载内容.到目前为止,我一直在"for链接中的链接:"中执行此操作,但是现在所花费的时间太长了.我认为现在是实现多线程或多处理方法的时候了.我的问题是,最好的方法是什么?

I'm trying to solve a problem, where I have many (on the order of ten thousand) URLs, and need to download the content from all of them. I've been doing this in a "for link in links:" loop up till now, but the amount of time it's taking is now too long. I think it's time to implement a multithreaded or multiprocessing approach. My question is, what is the best approach to take?

我了解全局解释器锁定,但是由于我的问题是网络绑定的,而不是CPU绑定的,所以我认为这不会成为问题.我需要将数据从每个线程/进程传递回主线程/进程.我不需要执行任何方法的帮助(在任何线程完成任务时终止多个线程涵盖了),我需要采取哪种方法的建议.我目前的方法:

I know about the Global Interpreter Lock, but since my problem is network-bound, not CPU-bound, I don't think that will be an issue. I need to pass data back from each thread/process to the main thread/process. I don't need help implementing whatever approach (Terminate multiple threads when any thread completes a task covers that), I need advice on which approach to take. My current approach:

data_list = get_data(...)
output = []
for datum in data:
    output.append(get_URL_data(datum))
return output

没有其他共享状态.

我认为最好的方法是让队列中包含所有数据,并从输入队列中弹出几个工作线程,获取URL数据,然后推入输出队列.

I think the best approach would be to have a queue with all the data in it, and have several worker threads pop from the input queue, get the URL data, then push onto an output queue.

我是对的吗?有什么我想念的吗?这是我第一次用任何语言实现多线程代码,我知道这通常是一个难题.

Am I right? Is there anything I'm missing? This is my first time implementing multithreaded code in any language, and I know it's generally a Hard Problem.

推荐答案

对于您的特定任务,我建议使用多处理工作者池.您只需定义一个池,并告诉它要使用多少个进程(默认情况下每个处理器内核一个),以及要在每个工作单元上运行的功能.然后,您准备好列表中的每个工作单元(在您的情况下,这将是URL列表),并将其分配给工作人员池.

For your specific task I would recommend a multiprocessing worker pool. You simply define a pool and tell it how many processes you want to use (one per processor core by default) as well as a function you want to run on each unit of work. Then you ready every unit of work (in your case this would be a list of URLs) in a list and give it to the worker pool.

您的输出将是原始数组中每个工作项的辅助函数返回值的列表.所有出色的多处理优势都将在后台发生.当然,还有其他使用工作池的方法,但这是我最喜欢的方法.

Your output will be a list of the return values of your worker function for every item of work in your original array. All the cool multi-processing goodness will happen in the background. There is of course other ways of working with the worker pool as well, but this is my favourite one.

快乐的多处理!

这篇关于在Python中使用多个线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆