如何使用caffe和OpenBLAS使用多CPU内核训练NN [英] How to use multi CPU cores to train NNs using caffe and OpenBLAS

查看:205
本文介绍了如何使用caffe和OpenBLAS使用多CPU内核训练NN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近正在学习深度学习,我的朋友推荐我咖啡.在使用OpenBLAS安装它之后,我遵循了文档中的教程 MNIST任务.但是后来我发现它非常慢,并且只有一个CPU内核正在工作.

I am learning deep learning recently and my friend recommended me caffe. After install it with OpenBLAS, I followed the tutorial, MNIST task in the doc. But later I found it was super slow and only one CPU core was working.

问题是我的实验室中的服务器没有GPU,所以我不得不使用CPU.

The problem is that the servers in my lab don't have GPU, so I have to use CPUs instead.

我用Google搜索了此页面,并得到了一些类似于的页面.我尝试export OPENBLAS_NUM_THREADS=8export OMP_NUM_THREADS=8.但是caffe仍然使用一个核心.

I Googled this and got some page like this . I tried to export OPENBLAS_NUM_THREADS=8 and export OMP_NUM_THREADS=8. But caffe still used one core.

如何使Caffe使用多个CPU?

How can I make caffe use multi CPUs?

非常感谢.

推荐答案

@Karthik.这对我也有用.我所做的一个有趣发现是,使用4个线程将在caffe计时测试期间将前进/后退次数减少了2倍.但是,将线程数增加到8甚至24会导致f/b速度小于我使用OPENBLAS_NUM_THREADS = 4获得. 这是一些线程计数的时间(在NetworkInNetwork模型上测试).

@Karthik. That also works for me. One interesting discovery that I made was that using 4 threads reduces forward/backward pass during the caffe timing test by a factor of 2. However, increasing the thread count to 8 or even 24 results in f/b speed that is less than what I get with OPENBLAS_NUM_THREADS=4. Here are times for a few thread counts (tested on NetworkInNetwork model).

[#threads] [f/b时间,以毫秒为单位]
1223
2150
4113
8125
12144

[#threads] [f/b time in ms]
1 223
2 150
4 113
8 125
12 144

为了进行比较,在Titan X GPU上,f/b传递花费了1.87毫秒.

For comparison, on a Titan X GPU the f/b pass took 1.87 ms.

这篇关于如何使用caffe和OpenBLAS使用多CPU内核训练NN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆