如何使用caffe和OpenBLAS使用多CPU内核训练NN [英] How to use multi CPU cores to train NNs using caffe and OpenBLAS
问题描述
我最近正在学习深度学习,我的朋友推荐我咖啡.在使用OpenBLAS安装它之后,我遵循了文档中的教程 MNIST任务.但是后来我发现它非常慢,并且只有一个CPU内核正在工作.
I am learning deep learning recently and my friend recommended me caffe. After install it with OpenBLAS, I followed the tutorial, MNIST task in the doc. But later I found it was super slow and only one CPU core was working.
问题是我的实验室中的服务器没有GPU,所以我不得不使用CPU.
The problem is that the servers in my lab don't have GPU, so I have to use CPUs instead.
我用Google搜索了此页面,并得到了一些类似于此的页面.我尝试export OPENBLAS_NUM_THREADS=8
和export OMP_NUM_THREADS=8
.但是caffe仍然使用一个核心.
I Googled this and got some page like this . I tried to export OPENBLAS_NUM_THREADS=8
and export OMP_NUM_THREADS=8
. But caffe still used one core.
如何使Caffe使用多个CPU?
How can I make caffe use multi CPUs?
非常感谢.
推荐答案
@Karthik.这对我也有用.我所做的一个有趣发现是,使用4个线程将在caffe计时测试期间将前进/后退次数减少了2倍.但是,将线程数增加到8甚至24会导致f/b速度小于我使用OPENBLAS_NUM_THREADS = 4获得. 这是一些线程计数的时间(在NetworkInNetwork模型上测试).
@Karthik. That also works for me. One interesting discovery that I made was that using 4 threads reduces forward/backward pass during the caffe timing test by a factor of 2. However, increasing the thread count to 8 or even 24 results in f/b speed that is less than what I get with OPENBLAS_NUM_THREADS=4. Here are times for a few thread counts (tested on NetworkInNetwork model).
[#threads] [f/b时间,以毫秒为单位]
1223
2150
4113
8125
12144
[#threads] [f/b time in ms]
1 223
2 150
4 113
8 125
12 144
为了进行比较,在Titan X GPU上,f/b传递花费了1.87毫秒.
For comparison, on a Titan X GPU the f/b pass took 1.87 ms.
这篇关于如何使用caffe和OpenBLAS使用多CPU内核训练NN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!