在TensorFlow中使用多个CPU内核 [英] Using multiple CPU cores in TensorFlow

查看:1403
本文介绍了在TensorFlow中使用多个CPU内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在TensorFlow上广泛研究了其他答案,但似乎无法在我的CPU上使用多个内核.

I have extensively studied other answers on TensorFlow and I just cannot seem to get it to use multiple cores on my CPU.

根据htop,以下程序仅使用单个CPU内核:

According to htop, the following program only uses a single CPU core:

import tensorflow as tf

n_cpus = 20

sess = tf.Session(config=tf.ConfigProto(
    device_count={ "CPU": n_cpus },
    inter_op_parallelism_threads=n_cpus,
    intra_op_parallelism_threads=1,
))

size = 100000

A = tf.ones([size, size], name="A")
B = tf.ones([size, size], name="B")
C = tf.ones([size, size], name="C")

with tf.device("/cpu:0"):
    x = tf.matmul(A, B)
with tf.device("/cpu:1"):
    y = tf.matmul(A, C)

sess.run([x, y])

# run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
# run_metadata = tf.RunMetadata()
# sess.run([x, y], options=run_options, run_metadata=run_metadata)

# for device in run_metadata.step_stats.dev_stats:
#     device_name = device.device
#     print(device.device)
#     for node in device.node_stats:
#         print("   ", node.node_name)

但是,当我取消注释底部的行并更改size以使计算实际上在合理的时间内完成时,我发现TensorFlow似乎认为它正在使用至少2个CPU设备:

However, when I uncomment the lines at the bottom, and change size so that the computation actually finishes in a reasonable amount of time, I see that TensorFlow seems to think it's using at least 2 CPU devices:

/job:localhost/replica:0/task:0/device:CPU:0
    _SOURCE
    MatMul
    _retval_MatMul_0_0
    _retval_MatMul_1_0_1
/job:localhost/replica:0/task:0/device:CPU:1
    _SOURCE
    MatMul_1

根本上,我想在这里要做的是在不同的内核上并行执行不同的操作.尽管我知道在这个人为的示例中碰巧可以正常工作,但我不想将单个操作拆分到多个内核上. device_countinter_op_parallelism_threads都听起来像我想要的,但似乎都没有真正导致使用多个内核.我尝试了所有我能想到的组合,包括将一个或另一个设置为1以防它们相互冲突,并且似乎没有任何作用.

Fundamentally, what I want to do here is execute different ops on different cores in parallel. I don't want to split a single op over multiple cores, though I know that happens to work in this contrived example. Both device_count and inter_op_parallelism_threads sound like what I want, but neither seems to actually result in using multiple cores. I've tried all combinations I can think of, including setting one or the other to 1 in case they conflict with each other, and nothing seems to work.

我还可以通过taskset确认我对CPU的亲和力没有做任何奇怪的事情:

I can also confirm with taskset that I'm not doing anything strange with my CPU affinity:

$ taskset -p $$
pid 21395's current affinity mask: ffffffffff

要使该代码使用多个CPU内核,我到底要做什么?

What exactly do I have to do to this code to get it to use multiple CPU cores?

注意:

  • 此答案中,我正在设置device_countinter_op_parallelism_threads.
  • 跟踪命令来自此答案.
  • 我可以删除tf.device调用,这似乎对我的CPU使用率没有任何影响.
  • From this answer among others I'm setting the device_count and inter_op_parallelism_threads.
  • The tracing command comes from this answer.
  • I can remove the tf.device calls and it doesn't seem to make any difference to my CPU utilization.

我正在使用从conda安装的TensorFlow 1.10.0.

I'm using TensorFlow 1.10.0 installed from conda.

推荐答案

tf.placeholder,我在这里编写了一个示例程序来使用它:

After some back and forth on the TensorFlow issue here we determined that the issue was that the program was being "optimized" by a constant folding pass, because the inputs were all trivial. It turns out this constant folding pass runs sequentially. Therefore, if you want to observe a parallel execution, the way to do this is to make the inputs non-trivial so that the constant folding won't apply to them. The method suggested in the issue was to use tf.placeholder, and I have written an example program that makes use of this here:

https://gist.github.com/elliottslaughter/750a27c832782f4daec8686281027de8

有关程序输出示例,请参见原始问题: https://github.com/tensorflow/tensorflow/issues/22619

See the original issue for sample output from the program: https://github.com/tensorflow/tensorflow/issues/22619

这篇关于在TensorFlow中使用多个CPU内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆