多处理:比cpu.count更多的进程 [英] Multiprocessing : More processes than cpu.count

查看:62
本文介绍了多处理:比cpu.count更多的进程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:2天前,我涉足"了multiprocessing的土地.所以我的理解是非常基础的.

Note: I "forayed" into the land of multiprocessing 2 days ago. So my understanding is very basic.

我正在编写并申请将文件上传到amazon s3存储桶.如果文件较大(100mb),则Ive使用multiprocessing模块中的pool实现了并行上传.我正在使用core i7的计算机,我的cpu_count8.我的印象是,如果我使用pool = Pool(process = 6),我将使用6内核,并且文件开始部分分批上传,而前6个部分的上传则同时开始.为了查看process大于cpu_count时发生了什么,我输入了20(这意味着我想使用20个内核).令我惊讶的是,程序没有同时出现错误,而是开始同时上传20个部分(我使用了较小的chunk size以确保有足够的部分). 我不了解这种行为.我只有8内核,那么他的程序如何不能接受20的输入?当我说process=6时,它实际上使用6个线程吗?这可能是20是有效输入的唯一解释,因为可能有1000个线程.有人可以向我解释一下.

I am writing and application for uploads to amazon s3 buckets. In case the file size is larger(100mb), Ive implemented parallel uploads using pool from the multiprocessing module. I am using a machine with core i7 , i had a cpu_count of 8. I was under the impression that if i do pool = Pool(process = 6) I use 6 cores and the file begins to upload in parts and the uploads for the first 6 parts begins simultaneously. To see what happens when the process is greater than the cpu_count , i entered 20 (implying that i want to use 20 cores). To my surprise instead of getting a block of errors the program began to upload 20 parts simultaneously (I used a smaller chunk size to make sure there are plenty of parts). I dont understand this behavior. I have only 8 cores, so how cant he program accept an input of 20? When I say process=6, does it actually use 6 threads?? Which can be the only explanation of 20 being a valid input as there can be 1000s of threads. Can someone please explain this to me.

我从此处.我只做了一点改动,我要求用户选择核心用途,而不是将parallel_processes设置为4

I 'borrowed' the code from here. I have changed it only slightly wherein I ask the user for a core usage for his choice instead of setting parallel_processes to 4

推荐答案

计算机上同时运行的进程数不受内核数的限制.实际上,您现在可能已经在计算机上运行了数百个程序-每个程序都有其自己的进程.为了使其正常工作,操作系统仅将您的8个处理器中的一个临时分配给每个进程或线程-有时它可能会停止并由另一个进程代替.请参阅并发编程和并行编程之间有什么区别?想了解更多.

The number of processes running concurrently on your computer is not limited by the number of cores. In fact you probably have hundreds of programs running right now on your computer - each with its own process. To make it work the OS assigns one of your 8 processors to each process or thread only temporarily - at some point it may get stopped and another process will take its place. See What is the difference between concurrent programming and parallel programming? if you want to find out more.

在您的上载示例中分配更多进程可能有意义,也可能没有意义.从磁盘读取并通过网络发送通常是python中的阻止操作.等待其数据块被读取或发送的进程可以停止,以便另一个进程可以启动其IO.另一方面,由于进程过多,文件I/O或网络I/O都会成为瓶颈,并且由于进程切换所需的额外开销,您的程序将变慢.

Assigning more processes in your uploading example may or may not make sense. Reading from disk and sending over the network is normally a blocking operation in python. A process that waits for its chunk of data to be read or sent can be halted so that another process may start its IO. On the other hand, with too many processes either file I/O or network I/O will become a bottleneck and your program will slow down because of the additional overhead needed for process switching.

这篇关于多处理:比cpu.count更多的进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆