Python多处理:限制使用的内核数 [英] Python multiprocessing: restrict number of cores used

查看:85
本文介绍了Python多处理:限制使用的内核数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何将N个独立的任务分配给具有L个内核的机器上的M个处理器,其中L> M.我不想使用所有处理器,因为我仍然希望有可用的I/O.我尝试过的解决方案似乎已分发到所有处理器,使系统陷入瘫痪.

I want to know how to distribute N independent tasks to exactly M processors on a machine that has L cores, where L>M. I don't want to use all the processors because I still want to have I/O available. The solutions I've tried seem to get distributed to all processors, bogging down the system.

我认为多处理模块是必经之路.

I assume the multiprocessing module is the way to go.

我做数值模拟.我的背景是物理学而不是计算机科学,所以不幸的是,我经常不完全理解涉及标准任务模型(例如服务器/客户端,生产者/消费者等)的讨论.

I do numerical simulations. My background is in physics, not computer science, so unfortunately, I often don't fully understand discussions involving standard tasking models like server/client, producer/consumer, etc.

以下是我尝试过的一些简化模型:

Here are some simplified models that I've tried:

假设我有一个运行模拟的函数run_sim(**kwargs)(请参见下文)和一堆用于模拟的kwarg,并且我有一个8核计算机.

Suppose I have a function run_sim(**kwargs) (see that further below) that runs a simulation, and a long list of kwargs for the simulations, and I have an 8 core machine.

from multiprocessing import Pool, Process

#using pool
p = Pool(4)
p.map(run_sim, kwargs)

# using process
number_of_live_jobs=0
all_jobs=[]
sim_index=0
while sim_index < len(kwargs)+1:
   number_of_live_jobs = len([1 for job in all_jobs if job.is_alive()])
   if number_of_live_jobs <= 4:
      p = Process(target=run_sim, args=[], kwargs=kwargs[sim_index])
      print "starting job", kwargs[sim_index]["data_file_name"]
      print "number of live jobs: ", number_of_live_jobs
      p.start()
      p.join()
      all_jobs.append(p)
      sim_index += 1

当我用"top"然后是"1"查看处理器使用率时,无论哪种情况,所有处理器似乎都被使用了.毫无疑问,我会误解"top"的输出,但是如果run_simulation()是处理器密集型的,则机器会陷入沉重的泥潭.

When I look at the processor usage with "top" and then "1", All processors seem to get used anyway in either case. It is not out of the question that I am misinterpreting the output of "top", but if the run_simulation() is processor intensive, the machine bogs down heavily.

假想的模拟和数据:

# simulation kwargs
numbers_of_steps = range(0,10000000, 1000000)
sigmas = [x for x in range(11)]
kwargs = []
for number_of_steps in numbers_of_steps:
   for sigma in sigmas:
      kwargs.append(
         dict(
            number_of_steps=number_of_steps,
            sigma=sigma,
            # why do I need to cast to int?
            data_file_name="walk_steps=%i_sigma=%i" % (number_of_steps, sigma),
            )
         )

import random, time
random.seed(time.time())

# simulation of random walk
def run_sim(kwargs):
   number_of_steps = kwargs["number_of_steps"]
   sigma = kwargs["sigma"]
   data_file_name = kwargs["data_file_name"]
   data_file = open(data_file_name+".dat", "w")
   current_position = 0
   print "running simulation", data_file_name
   for n in range(int(number_of_steps)+1):
      data_file.write("step number %i   position=%f\n" % (n, current_position))
      random_step = random.gauss(0,sigma)
      current_position += random_step

   data_file.close()

推荐答案

在我的双核计算机上,进程的总数得到了认可,即如果可以的话,

On my dual-core machine the total number of processes is honoured, i.e. if I do

p = Pool(1)

然后,在任何给定时间,我只会看到一个正在使用的CPU.该进程可以自由迁移到另一个处理器,但是另一个处理器处于空闲状态.我看不到如何同时使用所有处理器,所以我不了解这与您的I/O问题有何关系.当然,如果您的模拟是受I/O约束的,那么无论核心使用情况如何,您都会看到缓慢的I/O ...

Then I only see one CPU in use at any given time. The process is free to migrate to a different processor, but then the other processor is idle. I don't see how all your processors can be in use at the same time, so I don't follow how this can be related to your I/O issues. Of course, if your simulation is I/O bound, then you will see sluggish I/O regardless of core usage...

这篇关于Python多处理:限制使用的内核数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆