为什么 GridSearchCV 在 {method 'acquire' of 'thread.lock' objects} 上花费了超过 50% 的时间? [英] Why GridSearchCV spends more than 50% time on {method 'acquire' of 'thread.lock' objects}?
问题描述
最近我正在调整我的一些机器学习管道.我决定利用我的多核处理器.我使用参数 n_jobs=-1
进行了交叉验证.我还对它进行了分析,令我惊讶的是:最重要的功能是:
Recently I am tuning up some of my machine learning pipeline. I decided to take advantage of my multicore processor. And I ran cross-validation with param n_jobs=-1
. I also profiled it and what was suprise for me: the top function was:
{method 'acquire' of 'thread.lock' objects}
由于我在 Pipeline
中执行的操作,我不确定这是否是我的错.所以我决定做个小实验:
I was not sure if it was my fault due to operations I do in Pipeline
. So I decided to make small experiment:
pp = Pipeline([('svc', SVC())])
cv = GridSearchCV(pp, {'svc__C' : [1, 100, 200]}, jobs=-1, cv=2, refit=True)
%prun cv.fit(np.random.rand(1e4, 100), np.random.randint(0, 5, 1e4))
输出是:
2691 function calls (2655 primitive calls) in 74.005 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
83 43.819 0.528 43.819 0.528 {method 'acquire' of 'thread.lock' objects}
1 30.112 30.112 30.112 30.112 {sklearn.svm.libsvm.fit}
我想知道这种行为的原因是什么.如果可以稍微加快速度.
I wonder what is the cause of such behavior. And if it is possible to speed it up a little bit.
推荐答案
分析器只告诉你主进程在做什么,而它的子进程在做所有的工作.在这种情况下,在 GridSearchCV
上设置 verbose=2
可能会提供比 %prun
更好的输出.
The profiler is only telling you what the main process is doing, while its child processes are doing all the work. Setting verbose=2
on GridSearchCV
may give better output than %prun
in this case.
这篇关于为什么 GridSearchCV 在 {method 'acquire' of 'thread.lock' objects} 上花费了超过 50% 的时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!