Python多处理似乎并没有使用多个内核 [英] Python multiprocessing doesn't seem to use more than one core

查看:55
本文介绍了Python多处理似乎并没有使用多个内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Python多处理来为预测模型运行网格搜索. 当我查看核心使用情况时,它似乎总是只使用一个核心.知道我在做什么错吗?

I want to use Python multiprocessing to run grid search for a predictive model. When I look at core usage, it always seem to be using only one core. Any idea what I'm doing wrong?

import multiprocessing
from sklearn import svm
import itertools

#first read some data
#X will be my feature Numpy 2D array
#y will be my 1D Numpy array of labels

#define the grid        
C = [0.1, 1]
gamma = [0.0]
params = [C, gamma]
grid = list(itertools.product(*params))
GRID_hx = []

def worker(par, grid_list):
    #define a sklearn model
    clf = svm.SVC(C=g[0], gamma=g[1],probability=True,random_state=SEED)
    #run a cross validation function: returns error
    ll = my_cross_validation_function(X, y, model=clf, n=1, test_size=0.2)
    print(par, ll)
    grid_list.append((par, ll))


if __name__ == '__main__':
   manager = multiprocessing.Manager()
   GRID_hx = manager.list()
   jobs = []
   for g in grid:
      p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
      jobs.append(p)
      p.start()
      p.join()

   print("\n-------------------")
   print("SORTED LIST")
   print("-------------------")
   L = sorted(GRID_hx, key=itemgetter(1))
   for l in L[:5]:
      print l

推荐答案

您的问题是您在开始每个工作后立即加入:

Your problem is that you join each job immediately after you started it:

for g in grid:
    p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
    jobs.append(p)
    p.start()
    p.join()

join 块,直到相应的进程完成工作为止.这意味着您的代码一次仅启动一个进程,一直等到完成后再启动下一个进程.

join blocks until the respective process has finished working. This means that your code starts only one process at once, waits until it is finished and then starts the next one.

为了使所有进程并行运行,您需要先启动所有进程,然后加入所有进程:

In order for all processes to run in parallel, you need to first start them all and then join them all:

jobs = []
for g in grid:
    p = multiprocessing.Process(target=worker, args=(g,GRID_hx))
    jobs.append(p)
    p.start()

for j in jobs:
    j.join()

文档:链接

这篇关于Python多处理似乎并没有使用多个内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆