增加n_jobs对GridSearchCV没有影响 [英] Increasing n_jobs has no effect on GridSearchCV
问题描述
我设置了一个简单的实验,以在使用KNeighborsClassifier
运行sklearn GridSearchCV
时检查多核CPU的重要性.我得到的结果令我感到惊讶,我想知道我是否误解了多核的好处,或者我做得不好.
I have setup simple experiment to check importance of the multi core CPU while running sklearn GridSearchCV
with KNeighborsClassifier
. The results I got are surprising to me and I wonder if I misunderstood the benefits of multi cores or maybe I haven't done it right.
2-8个工作之间的完成时间没有差异.怎么会 ?我已经注意到"CPU性能"选项卡上的差异.在第一个单元运行时,CPU使用率约为13%,而最后一个单元则逐渐增加到100%.我期望它能更快完成.也许不是线性地更快,也就是8个工作将比4个工作快2倍,但要快一点.
There is no difference in time to completion between 2-8 jobs. How come ? I have noticed the difference on a CPU Performance tab. While the first cell was running CPU usage was ~13% and it was gradually increasing to 100% for the last cell. I was expecting it to finish faster. Maybe not linearly faster aka 8 jobs would be 2 times faster then 4 jobs but a bit faster.
这是我的设置方式:
我正在使用jupyter-notebook,单元格是指jupyter-notebook单元格.
I am using jupyter-notebook, cell refers to jupyter-notebook cell.
我已经加载了MNIST,并对X_play
中的3000
位数字使用了0.05
测试大小.
I have loaded MNIST and used 0.05
test size for 3000
digits in a X_play
.
from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
mnist = fetch_mldata('MNIST original')
X, y = mnist["data"], mnist['target']
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]
_, X_play, _, y_play = train_test_split(X_train, y_train, test_size=0.05, random_state=42, stratify=y_train, shuffle=True)
在下一个单元格中,我设置了KNN
和一个GridSearchCV
In the next cell I have setup KNN
and a GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
knn_clf = KNeighborsClassifier()
param_grid = [{'weights': ["uniform", "distance"], 'n_neighbors': [3, 4, 5]}]
然后,我为8个n_jobs值完成了8个单元格.我的CPU是4核8线程的i7-4770.
Then I done 8 cells for 8 n_jobs values. My CPU is i7-4770 with 4 cores 8 threads.
grid_search = GridSearchCV(knn_clf, param_grid, cv=3, verbose=3, n_jobs=N_JOB_1_TO_8)
grid_search.fit(X_play, y_play)
结果
Parallel(n_jobs=1)]: Done 18 out of 18 | elapsed: 2.0min finished
Parallel(n_jobs=2)]: Done 18 out of 18 | elapsed: 1.4min finished
Parallel(n_jobs=3)]: Done 18 out of 18 | elapsed: 1.3min finished
Parallel(n_jobs=4)]: Done 18 out of 18 | elapsed: 1.3min finished
Parallel(n_jobs=5)]: Done 18 out of 18 | elapsed: 1.4min finished
Parallel(n_jobs=6)]: Done 18 out of 18 | elapsed: 1.4min finished
Parallel(n_jobs=7)]: Done 18 out of 18 | elapsed: 1.4min finished
Parallel(n_jobs=8)]: Done 18 out of 18 | elapsed: 1.4min finished
第二项测试
随机森林分类器的使用要好得多.测试大小为0.5
,30000
图片.
Random Forest Classifier usage was much better. Test size was 0.5
, 30000
images.
from sklearn.ensemble import RandomForestClassifier
rf_clf = RandomForestClassifier()
param_grid = [{'n_estimators': [20, 30, 40, 50, 60], 'max_features': [100, 200, 300, 400, 500], 'criterion': ['gini', 'entropy']}]
Parallel(n_jobs=1)]: Done 150 out of 150 | elapsed: 110.9min finished
Parallel(n_jobs=2)]: Done 150 out of 150 | elapsed: 56.8min finished
Parallel(n_jobs=3)]: Done 150 out of 150 | elapsed: 39.3min finished
Parallel(n_jobs=4)]: Done 150 out of 150 | elapsed: 35.3min finished
Parallel(n_jobs=5)]: Done 150 out of 150 | elapsed: 36.0min finished
Parallel(n_jobs=6)]: Done 150 out of 150 | elapsed: 34.4min finished
Parallel(n_jobs=7)]: Done 150 out of 150 | elapsed: 32.1min finished
Parallel(n_jobs=8)]: Done 150 out of 150 | elapsed: 30.1min finished
推荐答案
以下是可能可能是这种行为的原因
Here are some reasons which might be a cause of this behaviour
- 随着数量的增加线程数,初始化和释放每个线程会产生明显的开销.我在i7 7700HQ上运行了您的代码,每次增加
n_job
时,都会看到以下行为- 当
n_job=1
和n_job=2
的每个线程的时间(通过GridSearchCV对模型进行全面评估并对其进行完全测试并评估每个模型的时间)为2.9s(总时间约为2分钟) -
n_job=3
时,时间为3.4秒(总时间为1.4分钟) -
n_job=4
时,时间为3.8秒(总时间为58秒) -
n_job=5
时,时间为4.2秒(总时间为51秒) -
n_job=6
时,时间为4.2秒(总时间约为49秒) -
n_job=7
时,时间为4.2秒(总时间约为49秒) -
n_job=8
时,时间为4.2秒(总时间约为49秒)
- With increasing no. of threads, there is an apparent overhead incurred for intializing and releasing each thread. I ran your code on my i7 7700HQ, I saw the following behaviour with each inceasing
n_job
- when
n_job=1
andn_job=2
the time per thread(Time per model evaluation by GridSearchCV to fully train the model and test it) was 2.9s (overall time ~2 mins) - when
n_job=3
, time was 3.4s (overall time 1.4 mins) - when
n_job=4
, time was 3.8s (overall time 58 secs) - when
n_job=5
, time was 4.2s (overall time 51 secs) - when
n_job=6
, time was 4.2s (overall time ~49 secs) - when
n_job=7
, time was 4.2s (overall time ~49 secs) - when
n_job=8
, time was 4.2s (overall time ~49 secs)
现在您可以看到,每个线程的时间增加了,但总体时间似乎减少了(尽管超过了
n_job=4 the different was not exactly linear) and remained constained with
n_jobs> = 6`,这是由于初始化和释放线程会产生成本)请参阅此github问题和Now as you can see, time per thread increased but overall time seem to decrease (although beyond
n_job=4 the different was not exactly linear) and remained constained with
n_jobs>=6` This is due to the fact that there is a cost incurred with initializing and releaseing threads. See this github issue and this issue.此外,可能还存在其他瓶颈,例如数据量大,要同时广播到所有线程,线程在RAM上抢占(或其他资源等),如何将数据压入每个线程线程等.
Also, there might be other bottlenecks like data being to large to be broadcasted to all threads at the same time, thread pre-emption over RAM (or other resouces,etc.), how data is pushed into each thread, etc.
我建议您阅读有关Ahmdal定律的信息,该定律指出通过公式给出的并行化可以实现加速的理论界限 图片来源:阿姆达尔定律:维基百科
I suggest you to read about Ahmdal's Law which states that there is a theoretical bound on the speedup that can be achieved through parallelization which is given by the formula Image Source : Ahmdal's Law : Wikipedia
最后,这可能是由于数据大小以及您用于训练的模型的复杂性所致.
Finally, it might be due to the data size and the complexity of the model you use for training as well.
这里是博客文章,它解释了相同的问题关于多线程.
Here is a blog post explaining the same issue regarding multithreading.
这篇关于增加n_jobs对GridSearchCV没有影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- when
- 当