AWS,Cuda,Tensorflow [英] AWS, Cuda, Tensorflow

查看:101
本文介绍了AWS,Cuda,Tensorflow的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我在功能最强大的AWS GPU实例(具有1或8个Tesla v100 16mb,也称为P3.x2large或P3.16xlarge)上运行Python代码时,它们都只比我的DELL XPS快2-3倍Geforce 1050-Ti笔记本电脑?

When I'm running my Python code on the most powerfull AWS GPU instances (with 1 or 8 x Tesla v100 16mb aka. P3.x2large or P3.16xlarge) they are both only 2-3 times faster than my DELL XPS Geforce 1050-Ti laptop?

我正在使用Windows,Keras,Cuda 9,Tensorflow 1.12和最新的Nvidia驱动程序.

I'm using Windows, Keras, Cuda 9, Tensorflow 1.12 and the newest Nvidia drivers.

当我通过GZU检查GPU负载时,GPU最大每次以43%的负载运行很短的时间.控制器以最大速度运行. 100%...

When I check the GPU load via GZU the GPU max. run at 43% load for a very short period - each time. The controller runs at max. 100%...

我使用的数据集是JSON格式的矩阵,这些文件位于Nitro驱动器上,容量为10TB,最大IOPS为64.000 IOPS.无论文件夹包含10TB,1TB还是100mb ...每次迭代训练仍然非常缓慢吗?

The dataset I use is matrices in JSON format and the files are located on a Nitro drive at 10TB with MAX 64.000 IOPS. No matter if the folder contains 10TB, 1TB or 100mb...the training is still very very slow per iteration?

非常欢迎所有建议!

更新1:

来自Tensorflow文档:

From the Tensorflow docs:

"要启动输入管道,必须定义一个源.例如,要从内存中的某些张量构造数据集,可以使用tf.data.Dataset.from_tensors()或tf.data.Dataset .from_tensor_slices().或者,如果输入数据以推荐的TFRecord格式存储在磁盘上,则可以构造tf.data.TFRecordDataset."

在我将矩阵存储为JSON格式(按节点制作)之前.我的TF在Python中运行. 现在,我仅将坐标保存在Node中,并以JSON格式保存. 现在的问题是:在Python中,什么是加载数据的最佳解决方案? TF可以仅使用坐标吗?还是必须再次将坐标重新设置为矩阵?或者

Before I had matrices stored in JSON format (Made by Node). My TF runs in Python. I will now only save the coordinates in Node and save it in JSON format. The question is now: In Python what is the best solution to load data? Can TF use the coordinates only or do I have to make the coordinates back to matrices again or what?

推荐答案

首先,您应该有一个很好的理由来增加基于Windows的AMI的计算开销.

First off, you should be having a really good reason to go for an increased computational overhead with Windows-based AMI.

如果CPU约为100%,而GPU则为<100%,那么您的CPU可能会成为瓶颈.如果在云中,请考虑转移到CPU数量更大的实例(CPU价格便宜,GPU稀缺).如果您无法增加CPU数量,则可以将图形的某些部分移至GPU.但是,基于tf.data的输入管道完全在CPU上运行(但由于采用C ++实现,因此具有很高的可伸缩性).预取到GPU可能也有帮助,但是生成另一个后台线程以填充缓冲区以供下游使用的开销可能会减弱此效果.另一种选择是(即在训练之前)离线进行一些或所有预处理步骤.

If your CPU is at ~100%, while GPU is <100%, then your CPU is likely the bottleneck. If you are on cloud, consider moving to instances with larger CPU-count (CPU is cheap, GPU is scarce). If you can't increase CPU count, moving some parts of your graph to GPU is an option. However, tf.data-based input pipeline is run entirely on CPU (but highly scalable due to C++ implementation). Prefetching to GPUs might also help here, but the cost of spawning another background thread to populate the buffer for downstream might damp this effect. Another option is to do some or all pre-processing steps offline (i.e. prior to training).

关于使用Keras作为输入管道的警告. Keras依赖于Python的multithreading(以及可选的multiprocessing)库,当执行繁重的I操作时,它们可能都缺乏性能/O或即时扩展)和可扩展性(在多个CPU上运行时)与 GIL tf.data 或第三方的,例如 Tensorpack ).

A word of caution on using Keras as the input pipeline. Keras relies on Python´s multithreading (and optionally multiprocessing) libraries, which may both lack performance (when doing heavy I/O or augmentations on-the-fly) and scalability (when running on multiple CPUs) compared to GIL-free implementations. Consider performing preprocessing offline, pre-loading input data, or using alternative input pipelines (as the aforementioned TF native tf.data, or 3rd party ones, like Tensorpack).

这篇关于AWS,Cuda,Tensorflow的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆