为什么Keras LSTM在CPU上的速度是GPU的三倍? [英] Why is Keras LSTM on CPU three times faster than GPU?

查看:402
本文介绍了为什么Keras LSTM在CPU上的速度是GPU的三倍?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用这本来自Kaggle的笔记本来运行LSTM神经网络.

I use this notebook from Kaggle to run LSTM neural network.

我已经开始训练神经网络,但发现它太慢了.它比CPU培训慢了将近三倍.

I had started training of neural network and I saw that it is too slow. It is almost three times slower than CPU training.

  • CPU perfomance:每个纪元8分钟;
  • GPU perfomance:每个纪元26分钟.
  • CPU perfomance: 8 min per epoch;
  • GPU perfomance: 26 min per epoch.

此后,我决定在有关Stackoverflow的问题中找到答案,然后应用了CuDNNLSTM (仅在GPU上运行),而不是LSTM.

After this I decided to find answer in this question on Stackoverflow and I applied a CuDNNLSTM (which runs only on GPU) instead of LSTM.

因此,GPU性能变为每个时期只有1分钟,并且模型的准确性降低了3%.

Hence, GPU perfomance became only 1 min per epoch and accuracy of model decreased on 3%.

问题:

1)有人知道为什么在经典LSTM层中GPU的运行速度比CPU慢吗?我不明白为什么会这样.

1) Does somebody know why GPU works slower than CPU in the classic LSTM layer? I do not understand why this happens.

2)为什么当我使用CuDNNLSTM而不是LSTM时,训练变得更快并且模型的准确性降低了?

2) Why when I use CuDNNLSTM instead of LSTM, training become much more faster and the accuracy of the model decrease?

P.S.:

My CPU:英特尔酷睿i7-7700处理器(8M高速缓存,最高4.20 GHz)

My CPU: Intel Core i7-7700 Processor (8M Cache, up to 4.20 GHz)

My GPU: nVidia GeForce GTX 1050 Ti(4 GB)

My GPU: nVidia GeForce GTX 1050 Ti (4 GB)

推荐答案

猜测这只是一个不同的更好的实现,并且,如果实现不同,则不应期望获得相同的结果.

Guessing it's just a different, better implementation and, if the implementation is different, you shouldn't expect identical results.

通常,在GPU上有效地实现算法是很困难的,要获得最佳性能,就需要特定于体系结构的实现.因此,与针对GPU的常规实现相比,针对Nvidia GPU的实现是否具有增强的性能也就不足为奇了.与致力于通用CNN实施的团队相比,Nvidia花费大量资源来加速其GPU代码也不足为奇.

In general, efficiently implementing an algorithm on a GPU is hard and getting maximum performance requires architecture-specific implementations. Therefore, it wouldn't be surprising if an implementation specific to Nvidia's GPUs had enhanced performance versus a general implementation for GPUs. It also wouldn't be surprising that Nvidia would sink significantly more resources into accelerating their code for their GPUs versus than would a team working on a general CNN implementation.

另一种可能性是,后端使用的数据类型已从双精度浮点数更改为单精度甚至半精度浮点数.较小的数据类型意味着您可以以准确性为代价更快地处理更多数字.对于NN应用程序,这通常是可以接受的,因为对于网络而言,不需要特别精确的数字即可产生可接受的结果.

The other possibility is that the data type used on the backend has changed from double- to single- or even half-precision float. The smaller data types mean you can crunch more numbers faster at the cost of accuracy. For NN applications this is often acceptable because no individual number needs to be especially accurate for the net to produce acceptable results.

这篇关于为什么Keras LSTM在CPU上的速度是GPU的三倍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆