基准张量流图的正确方法是什么? [英] What is the proper way to benchmark part of tensorflow graph?

查看:61
本文介绍了基准张量流图的正确方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对图表的某些部分进行基准测试,为简单起见,这里我使用的只是.

I want to benchmark some part of graph, here is for simplicity I use conv_block that is just conv3x3.

  1. 循环中使用的x_np是否相同还是我需要每次重新生成它?
  2. 在运行实际基准测试之前,我是否需要进行一些热身"运行(似乎在GPU上进行基准测试需要这样做)?怎么做呢? sess.run(tf.global_variables_initializer())够了吗?
  3. 在python中测量时间的正确方法是什么,即更精确的方法.
  4. 运行脚本之前我是否需要在linux上重置一些系统缓存(也许禁用np.random.seed就足够了)?
  1. Is it ok that x_np used in the loop is the same or I need to regenerate it each time?
  2. Do I need to do some 'warm up' run before run actual benchmark(seems this is needed for benchmark on GPU)? how to do it properly? is sess.run(tf.global_variables_initializer()) enough?
  3. What is proper way of measuring time in python, i.e. more precise method.
  4. Do I need to reset some system cache on linux before run script(maybe disabling np.random.seed is sufficient)?

示例代码:

import os
import time

import numpy as np
import tensorflow as tf

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

np.random.seed(2020)


def conv_block(x, kernel_size=3):
    # Define some part of graph here

    bs, h, w, c = x.shape
    in_channels = c
    out_channels = c

    with tf.variable_scope('var_scope'):
        w_0 = tf.get_variable('w_0', [kernel_size, kernel_size, in_channels, out_channels], initializer=tf.contrib.layers.xavier_initializer())
        x = tf.nn.conv2d(x, w_0, [1, 1, 1, 1], 'SAME')

    return x


def get_data_batch(spatial_size, n_channels):
    bs = 1
    h = spatial_size
    w = spatial_size
    c = n_channels

    x_np = np.random.rand(bs, h, w, c)
    x_np = x_np.astype(np.float32)
    #print('x_np.shape', x_np.shape)

    return x_np


def run_graph_part(f_name, spatial_size, n_channels, n_iter=100):
    print('=' * 60)
    print(f_name.__name__)

    tf.reset_default_graph()
    with tf.Session() as sess:
        x_tf = tf.placeholder(tf.float32, [1, spatial_size, spatial_size, n_channels], name='input')
        z_tf = f_name(x_tf)
        sess.run(tf.global_variables_initializer())

        x_np = get_data_batch(spatial_size, n_channels)
        start_time = time.time()
        for _ in range(n_iter):
            z_np = sess.run(fetches=[z_tf], feed_dict={x_tf: x_np})[0]
        avr_time = (time.time() - start_time) / n_iter
        print('z_np.shape', z_np.shape)
        print('avr_time', round(avr_time, 3))

        n_total_params = 0
        for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='var_scope'):
            n_total_params += np.prod(v.get_shape().as_list())
        print('Number of parameters:', format(n_total_params, ',d'))


if __name__ == '__main__':
    run_graph_part(conv_block, spatial_size=128, n_channels=32, n_iter=100)

推荐答案

您的主要问题的答案,对张量流图的基准进行基准测试的正确方法是什么?":

An answer to your primary question, 'What is the proper way to benchmark part of tensorflow graph?':

Tensorflow包含一个抽象类,可为tensorflow基准测试提供帮助:

Tensorflow includes an abstract class that provides helpers for tensorflow benchmarks: Benchmark.

因此,可以创建一个Benchmark对象,并将其用于在张量流图的一部分上执行基准测试.在下面的代码中,将实例化一个基准对象,然后调用run_op_benchmark方法.将run_op_benchmark传递给会话,conv_block张量(在这种情况下),feed_dict,多次刻录迭代,所需的最小迭代次数,布尔值标志,以保持基准测试还不计算内存使用情况,并且一个方便的名字.该方法返回包含基准测试结果的字典:

So, a Benchmark object can be made and used to execute a benchmark on part of a tensorflow graph. In the code below, a benchmark object is instantiated and then, the run_op_benchmark method is called. run_op_benchmark is passed the session, the conv_block Tensor (in this case), a feed_dict, a number of burn iterations, the desired minimum number of iterations, a boolean flag to keep the benchmark from also computing memory usage and a convenient name. The method returns a dictionary containing the benchmark results:

benchmark = tf.test.Benchmark()
results = benchmark.run_op_benchmark(sess=sess, op_or_tensor=z_tf, 
                                     feed_dict={x_tf: x_np}, burn_iters=2, 
                                     min_iters=n_iter, 
                                     store_memory_usage=False, name='example')

可以如下所示将此代码块插入您的代码中,以比较两个基准测试:

This block of code can be inserted within your code as follows to compare the two benchmarkings:

import os
import time

import numpy as np
import tensorflow as tf

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

np.random.seed(2020)


def conv_block(x, kernel_size=3):
    # Define some part of graph here

    bs, h, w, c = x.shape
    in_channels = c
    out_channels = c

    with tf.compat.v1.variable_scope('var_scope'):
        w_0 = tf.get_variable('w_0', [kernel_size, kernel_size, in_channels, out_channels], initializer=tf.keras.initializers.glorot_normal())
        x = tf.nn.conv2d(x, w_0, [1, 1, 1, 1], 'SAME')

    return x


def get_data_batch(spatial_size, n_channels):
    bs = 1
    h = spatial_size
    w = spatial_size
    c = n_channels

    x_np = np.random.rand(bs, h, w, c)
    x_np = x_np.astype(np.float32)
    #print('x_np.shape', x_np.shape)

    return x_np


def run_graph_part(f_name, spatial_size, n_channels, n_iter=100):
    print('=' * 60)
    print(f_name.__name__)

    tf.reset_default_graph()
    with tf.Session() as sess:
        x_tf = tf.placeholder(tf.float32, [1, spatial_size, spatial_size, n_channels], name='input')
        z_tf = f_name(x_tf)
        sess.run(tf.global_variables_initializer())

        x_np = get_data_batch(spatial_size, n_channels)
        start_time = time.time()
        for _ in range(n_iter):
            z_np = sess.run(fetches=[z_tf], feed_dict={x_tf: x_np})[0]
        avr_time = (time.time() - start_time) / n_iter
        print('z_np.shape', z_np.shape)
        print('avr_time', round(avr_time, 3))

        n_total_params = 0
        for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='var_scope'):
            n_total_params += np.prod(v.get_shape().as_list())
        print('Number of parameters:', format(n_total_params, ',d'))

        # USING TENSORFLOW BENCHMARK
        benchmark = tf.test.Benchmark()
        results = benchmark.run_op_benchmark(sess=sess, op_or_tensor=z_tf, 
                                             feed_dict={x_tf: x_np}, burn_iters=2, min_iters=n_iter,
                                             store_memory_usage=False, name='example')

        return results


if __name__ == '__main__':
    results = run_graph_part(conv_block, spatial_size=128, n_channels=32, n_iter=100)

这种在tensorflow库本身中的基准测试类的实现提供了有关其他问题答案的提示.由于tensorflow实现不需要为每个基准迭代使用新的feed_dict,因此似乎可以回答问题1)'循环中使用的x_np是否可以相同还是我需要重新生成它?每一次?'可以在每个循环中使用相同的x_np.关于问题2),似乎确实需要一些热身". tensorflow库实现建议的默认刻录迭代次数为2.关于问题3),

This implementation of a benchmarking class within the tensorflow library itself provides hints as to the answers to your other questions. As the tensorflow implementation does not necessitate use of a new feed_dict for each benchmark iteration, it would appear that the answer to question 1) 'Is it ok that x_np used in the loop is the same or I need to regenerate it each time?' is that it is OK to use the same x_np each loop. In regards to question 2), it does appear that some 'warm up' is necessary. The default number of burn iterations suggested by the tensorflow library implementation is 2. In regards to question 3), timeit is an excellent tool for measuring execution time of small code snippets. However, the tensorflow library itself uses time.time() in a similar manner to what you have done: run_op_benchmark (source). Interestingly, the tensorflow benchmark implementation reports back the median rather than the mean of the operation walltimes (presumably to make the benchmark more robust to outliers).

这篇关于基准张量流图的正确方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆