TensorFlow GPU 时代优化? [英] TensorFlow GPU Epoch Optimization?

查看:34
本文介绍了TensorFlow GPU 时代优化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以这段代码是有效的,它仅比 CPU 提高了 2 倍,但我认为它可能会更快.我认为问题归结为这方面...

So this code works, and it gives me a 2x boost over CPU only, but I think its possible to get it faster. I think the issue boils down to this area...

for i in tqdm(range(epochs), ascii=True):
    sess.run(train_step, feed_dict={x: train, y_:labels})

我认为发生的情况是,每个 epoch,我们都返回 CPU 获取有关下一步要做什么的信息(for 循环),然后 for 循环将推回 GPU.现在 GPU 可以将整个数据集和更多数据放入内存中.

I think what happens is that every epoch, we go back to the CPU for information on what to do next (the for loop) and the for loop pushes back to the GPU. Now the GPU can fit the entire data set and more into memory.

有可能吗,如果有,怎么做?只是让它在 GPU 上连续处理 1000 个 epoch,而不返回到 CPU 报告其状态.或者也许控制它报告状态的频率.说在 GPU 上 crunch 1000 epochs 会很好,然后看看我的训练与验证,然后再次紧缩.但是在每个 epoch 之间都这样做并没有真正的帮助.

Is it possible, and if so how? to just have it continually crunch 1000 epochs on the GPU without coming back to the CPU to report its status. Or perhaps control how often it reports status. It would be nice to say crunch 1000 epochs on GPU, and then see my train vs validation, then crunch again. But doing it between every epoch is not really helpful.

谢谢,

~大卫

推荐答案

session.run 的开销大约是 100 微秒,所以如果你执行 10k 步,这个开销会增加大约 1 秒.如果这很重要,那么您正在执行许多小迭代,并在其他地方产生额外的开销.IE,GPU 内核启动开销比 CPU 大 5 倍(5 usec vs 1 usec).

The overhead of session.run is around 100 usec, so if you do 10k steps, this overhead adds around 1 second. If this is significant, then you are doing many small iterations, and are incurring extra overhead in other places. IE, GPU kernel launch overhead is 5x larger than CPU (5 usec vs 1 usec).

使用 feed_dict 可能是一个更大的问题,您可以通过使用队列/输入管道来加快速度.

Using feed_dict is probably a bigger problem and you could speed things up by using queues/input pipelines.

此外,确定您的时间花在何处的一种可靠方法是进行概要分析.IE,要弄清楚您的 for 循环占用了多少时间,您可以按如下方式执行 cProfile.

Also, a robust way to figure out where you are spending time is to profile. IE, to figure out what fraction of time is due to your for loop, you can do cProfile as follows.

python -m cProfile -o timing.prof myscript.py
snakeviz  timing.prof

要找出时间在 TensorFlow run 中的去向,您可以按照 这里

To figure out where the time goes inside of TensorFlow run, you can do timeline profiling as described here

这篇关于TensorFlow GPU 时代优化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆