了解 tensorflow 内部/内部并行线程 [英] Understanding tensorflow inter/intra parallelism threads

查看:35
本文介绍了了解 tensorflow 内部/内部并行线程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想多了解一下这两个参数:intra 和 inter op parallelism 线程

I would like to understand a little more about these two parameters: intra and inter op parallelism threads

session_conf = tf.ConfigProto(
  intra_op_parallelism_threads=1,
  inter_op_parallelism_threads=1)

我读了这篇文章,其中有一个很好的解释:TensorFlow:inter-和操作内并行配置

I read this post which has a pretty good explanation: TensorFlow: inter- and intra-op parallelism configuration

但我正在寻求确认并在下面提出新问题.我在 keras 2.0.9、tensorflow 1.3.0 中运行我的任务:

But I am seeking confirmations and also asking new questions below. And I am running my task in keras 2.0.9, tensorflow 1.3.0:

  1. 当两者都设置为 1 时,是否意味着,例如在具有 4 个内核的计算机上,四个内核将只共享 1 个线程?
  2. 为什么在速度方面使用 1 个线程似乎对我的任务影响不大?我的网络具有以下结构:dropout、conv1d、maxpooling、lstm、globalmaxpooling、dropout、dense.上面引用的帖子说,如果有很多矩阵乘法和减法运算,使用多线程设置会有所帮助.我对下面的数学知之甚少,但我想我的模型中有很多这样的矩阵运算?但是,将两个参数都设置为 0 到 1 只会让 10 分钟的任务慢 1 分钟.
  3. 为什么多线程会导致不可重现的结果?请参阅使用 Python 中的 Keras 和 TensorFlow 无法重现结果.这是我在做科学实验时需要使用单线程的主要原因.当然,随着时间的推移,tensorflow 一直在改进,为什么版本中没有解决这个问题?
  1. when both are set to 1, does it mean that, on a computer with 4 cores for example, there will be only 1 thread shared by the four cores?
  2. why using 1 thread does not seem to affect my task very much in terms of speed? My network has the following structure: dropout, conv1d, maxpooling, lstm, globalmaxpooling,dropout, dense. The post cited above says that if there are a lot of matrix multiplication and subtraction operations, using a multiple thread setting can help. I do not know much about the math underneath but I'd imagine there are quite a lot of such matrix operations in my model? However, setting both params from 0 to 1 only sees a 1 minute slowdown over a 10 minute task.
  3. why multi-thread could be a source of non-reproducible results? See Results not reproducible with Keras and TensorFlow in Python. This is the main reason I need to use single threads as I am doing scientific experiments. And surely tensorflow has been improving over the time, why this is not addressed in the release?

非常感谢

推荐答案

  1. 当这两个参数都设置为 1 时,将有 1 个线程在 4 个内核中的 1 个内核上运行.它运行的核心可能会改变,但一次总是 1.

  1. When both parameters are set to 1, there will be 1 thread running on 1 of the 4 cores. The core on which it runs might change but it will always be 1 at a time.

当并行运行某些东西时,总会在通信上的损失时间和通过并行化获得的时间之间进行权衡.根据使用的硬件和特定任务(如矩阵的大小),加速会发生变化.有时并行运行某些东西甚至比使用一个内核还要慢.

When running something in parallel there is always a trade-off between lost time on communication and gained time through parallelization. Depending on the used hardware and the specific task (like the size of the matrices) the speedup will change. Sometimes running something in parallel will be even slower than using one core.

例如在 cpu 上使用浮点数时,(a + b) + c 将不等于 a + (b + c) 因为浮点精度.使用多个并行线程意味着像 a + b + c 这样的操作不会总是以相同的顺序计算,从而导致每次运行的结果不同.然而,这些差异非常小,在大多数情况下不会影响整体结果.完全可重现的结果通常只用于调试.强制执行完全可重复性会大大降低多线程处理速度.

For example when using floats on a cpu, (a + b) + c will not be equal to a + (b + c) because of the floating point precision. Using multiple parallel threads means that operations like a + b + c will not always be computed in the same order, leading to different results on each run. However those differences are extremely small and will not effect the overall result in most cases. Completely reproducible results are usually only needed for debugging. Enforcing complete reproducibility would slow down multi-threading a lot.

这篇关于了解 tensorflow 内部/内部并行线程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆