Python多处理:限制使用的内核数 [英] Python multiprocessing: restrict number of cores used
问题描述
我想知道如何将N个独立的任务分配给具有L个内核的机器上的M个处理器,其中L> M.我不想使用所有处理器,因为我仍然希望有可用的I/O.我尝试过的解决方案似乎已分发到所有处理器,使系统陷入瘫痪.
I want to know how to distribute N independent tasks to exactly M processors on a machine that has L cores, where L>M. I don't want to use all the processors because I still want to have I/O available. The solutions I've tried seem to get distributed to all processors, bogging down the system.
我认为多处理模块是必经之路.
I assume the multiprocessing module is the way to go.
我做数值模拟.我的背景是物理学而不是计算机科学,所以不幸的是,我经常不完全理解涉及标准任务模型(例如服务器/客户端,生产者/消费者等)的讨论.
I do numerical simulations. My background is in physics, not computer science, so unfortunately, I often don't fully understand discussions involving standard tasking models like server/client, producer/consumer, etc.
以下是我尝试过的一些简化模型:
Here are some simplified models that I've tried:
假设我有一个运行模拟的函数run_sim(**kwargs)
(请参见下文)和一堆用于模拟的kwarg,并且我有一个8核计算机.
Suppose I have a function run_sim(**kwargs)
(see that further below) that runs a simulation, and a long list of kwargs for the simulations, and I have an 8 core machine.
from multiprocessing import Pool, Process
#using pool
p = Pool(4)
p.map(run_sim, kwargs)
# using process
number_of_live_jobs=0
all_jobs=[]
sim_index=0
while sim_index < len(kwargs)+1:
number_of_live_jobs = len([1 for job in all_jobs if job.is_alive()])
if number_of_live_jobs <= 4:
p = Process(target=run_sim, args=[], kwargs=kwargs[sim_index])
print "starting job", kwargs[sim_index]["data_file_name"]
print "number of live jobs: ", number_of_live_jobs
p.start()
p.join()
all_jobs.append(p)
sim_index += 1
当我用"top"然后是"1"查看处理器使用率时,无论哪种情况,所有处理器似乎都被使用了.毫无疑问,我会误解"top"的输出,但是如果run_simulation()
是处理器密集型的,则机器会陷入沉重的泥潭.
When I look at the processor usage with "top" and then "1", All processors seem to get used anyway in either case. It is not out of the question that I am misinterpreting the output of "top", but if the run_simulation()
is processor intensive, the machine bogs down heavily.
假想的模拟和数据:
# simulation kwargs
numbers_of_steps = range(0,10000000, 1000000)
sigmas = [x for x in range(11)]
kwargs = []
for number_of_steps in numbers_of_steps:
for sigma in sigmas:
kwargs.append(
dict(
number_of_steps=number_of_steps,
sigma=sigma,
# why do I need to cast to int?
data_file_name="walk_steps=%i_sigma=%i" % (number_of_steps, sigma),
)
)
import random, time
random.seed(time.time())
# simulation of random walk
def run_sim(kwargs):
number_of_steps = kwargs["number_of_steps"]
sigma = kwargs["sigma"]
data_file_name = kwargs["data_file_name"]
data_file = open(data_file_name+".dat", "w")
current_position = 0
print "running simulation", data_file_name
for n in range(int(number_of_steps)+1):
data_file.write("step number %i position=%f\n" % (n, current_position))
random_step = random.gauss(0,sigma)
current_position += random_step
data_file.close()
推荐答案
在我的双核计算机上,进程的总数得到了认可,即如果可以的话,
On my dual-core machine the total number of processes is honoured, i.e. if I do
p = Pool(1)
然后,在任何给定时间,我只会看到一个正在使用的CPU.该进程可以自由迁移到另一个处理器,但是另一个处理器处于空闲状态.我看不到如何同时使用所有处理器,所以我不了解这与您的I/O问题有何关系.当然,如果您的模拟是受I/O约束的,那么无论核心使用情况如何,您都会看到缓慢的I/O ...
Then I only see one CPU in use at any given time. The process is free to migrate to a different processor, but then the other processor is idle. I don't see how all your processors can be in use at the same time, so I don't follow how this can be related to your I/O issues. Of course, if your simulation is I/O bound, then you will see sluggish I/O regardless of core usage...
这篇关于Python多处理:限制使用的内核数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!