tensorflow 如何分配 Ops 在 GPU 上运行? [英] how does tensorflow assign Ops to run on GPU?

查看:36
本文介绍了tensorflow 如何分配 Ops 在 GPU 上运行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 tensorflow 用于将不同的 Ops 分配给 CPU 或 GPU 的机制感到困惑.

I'm confused about the mechanism that tensorflow uses to assign different Ops to CPUs or GPUs.

  1. 以下面的伪代码为例.我们可以说:只要 SimpleOp 是在 with tf.device('/gpu:0') 的上下文中创建,它将肯定在 GPU 上运行(假设 SimpleOp 的 GPU 实现可用),无论其输入变量(in_1in_2)是在 CPU 或 GPU 上创建?

  1. Taking below pseudo code as an example. Can we say: as long as the SimpleOp is created within the context of with tf.device('/gpu:0'), it will surely run on GPU (suppose the GPU implementation of the SimpleOp is available), no matter its input variables (in_1 and in_2) are created on CPU or GPU?

with tf.device('/gpu:0'):
    out = tf.SimpleOp(in_1, in_2, name='Simple')

  • 我通过创建一个 session 来理解log_device_placement=True,tensorflow 输出设备所有变量/操作的放置.但是,有没有一种方法允许我只检查一个操作员的设备分配?

  • I understand by creating a session with log_device_placement=True, tensorflow outputs the device placements of all variables/Ops. However, is there a method allowing me to check only one Op's device assignment?

    提前致谢!

    推荐答案

    TLDR;您在 中使用 tf.device("/gpu:0") 创建的操作将始终在 GPU 上运行.如果您指定将输入放置在 cpu 上,那么它们将放置在 CPU 上.如果您省略输入的设备规范,它们将被放置在 GPU 上以更接近您的操作.您可以使用 run_metadata 获取包含所有设备分配的 Python 对象,并在那里查找您的操作.

    TLDR; your op created in with tf.device("/gpu:0") will always run on GPU. If you specify input to be placed on cpu, then they will get placed on CPU. If you omit device specifications for inputs, they will get placed on GPU to be closer to your op. You can use run_metadata to get a Python object with all device assignments, and lookup your op there.

    放置是由误导性命名的完成放置./a>,虽然评论指定了机制,但仍有一些错误被解决(即,此处),因此最好的方法是在实践中检查它.

    Placement is done by misleadingly named simple_placer.cc, and while the comments specify the mechanics, there are still some bugs getting hashed out (ie, here), so the best way is to check it in practice.

    当您说在 GPU 上创建变量时,实际上有两种放置方式——显式放置,当您在 with tf.device 块内创建相关操作时,以及隐式放置,在此类之外堵塞.在 之外使用 tf.device 创建操作相当于在 with tf.device(None) 块中创建操作.

    When you say that variables are created on GPU, there's actually two kinds of placement -- explicit, when you create the relevant op inside the with tf.device block, and implicit, outside of such block. Creating ops outside of with tf.device is equivalent to creating ops in with tf.device(None) block.

    这是一个简单的实验

    n = 10**6
    def inputs_cpu():
        tf.reset_default_graph()
        with tf.device("/cpu:0"):
            a = tf.ones((n,), name="A")
            b = tf.ones((n,), name="B")
        with tf.device("/gpu:0"):
            c = tf.add(a, b, name="C")
        return c
    
    def inputs_none():
        tf.reset_default_graph()
        a = tf.ones((n,), name="A")
        b = tf.ones((n,), name="B")
        with tf.device("/gpu:0"):
            c = tf.add(a, b, name="C")
        return c
    
    def run_and_summarize(target):
        # turn off graph-rewriting optimizations
        sess = tf.Session(config=tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0))))
        run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
        run_metadata = tf.RunMetadata()
        sess.run(target, options=run_options, run_metadata=run_metadata)
    
        for device in run_metadata.step_stats.dev_stats:
            device_name = device.device
            if not (device_name.endswith("/cpu:0") or device_name.endswith("/gpu:0")):
                continue
            print(device.device)
            for node in device.node_stats:
                print("   ", node.node_name)
    

    现在你可以这样做了

    run_and_summarize(inputs_cpu())
    

    在输入固定到 CPU 的情况下运行,您会看到此位置受到尊重

    That runs with inputs pinned to CPU and you'll see this placement is respected

    /job:localhost/replica:0/task:0/gpu:0
        _SOURCE
        C
    /job:localhost/replica:0/task:0/cpu:0
        _SOURCE
        A
        B
    

    另一方面,当未指定输入时

    On other hand when inputs are not specified

    run_and_summarize(inputs_none())
    

    你可以看到现在所有的操作都放在了 GPU 上

    You can see that now all ops are placed on GPU

    /job:localhost/replica:0/task:0/cpu:0
        _SOURCE
    /job:localhost/replica:0/task:0/gpu:0
        _SOURCE
        A
        B
        C
    

    这篇关于tensorflow 如何分配 Ops 在 GPU 上运行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆