基准张量流图的正确方法是什么? [英] What is the proper way to benchmark part of tensorflow graph?
问题描述
我想对图表的某些部分进行基准测试,为简单起见,这里我使用的只是
I want to benchmark some part of graph, here is for simplicity I use conv_block
that is just conv3x3.
- 循环中使用的
x_np
是否相同还是我需要每次重新生成它? - 在运行实际基准测试之前,我是否需要进行一些热身"运行(似乎在GPU上进行基准测试需要这样做)?怎么做呢?
sess.run(tf.global_variables_initializer())
够了吗? - 在python中测量时间的正确方法是什么,即更精确的方法.
- 运行脚本之前我是否需要在linux上重置一些系统缓存(也许禁用np.random.seed就足够了)?
- Is it ok that
x_np
used in the loop is the same or I need to regenerate it each time? - Do I need to do some 'warm up' run before run actual benchmark(seems this is needed for benchmark on GPU)? how to do it properly? is
sess.run(tf.global_variables_initializer())
enough? - What is proper way of measuring time in python, i.e. more precise method.
- Do I need to reset some system cache on linux before run script(maybe disabling np.random.seed is sufficient)?
示例代码:
import os
import time
import numpy as np
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
np.random.seed(2020)
def conv_block(x, kernel_size=3):
# Define some part of graph here
bs, h, w, c = x.shape
in_channels = c
out_channels = c
with tf.variable_scope('var_scope'):
w_0 = tf.get_variable('w_0', [kernel_size, kernel_size, in_channels, out_channels], initializer=tf.contrib.layers.xavier_initializer())
x = tf.nn.conv2d(x, w_0, [1, 1, 1, 1], 'SAME')
return x
def get_data_batch(spatial_size, n_channels):
bs = 1
h = spatial_size
w = spatial_size
c = n_channels
x_np = np.random.rand(bs, h, w, c)
x_np = x_np.astype(np.float32)
#print('x_np.shape', x_np.shape)
return x_np
def run_graph_part(f_name, spatial_size, n_channels, n_iter=100):
print('=' * 60)
print(f_name.__name__)
tf.reset_default_graph()
with tf.Session() as sess:
x_tf = tf.placeholder(tf.float32, [1, spatial_size, spatial_size, n_channels], name='input')
z_tf = f_name(x_tf)
sess.run(tf.global_variables_initializer())
x_np = get_data_batch(spatial_size, n_channels)
start_time = time.time()
for _ in range(n_iter):
z_np = sess.run(fetches=[z_tf], feed_dict={x_tf: x_np})[0]
avr_time = (time.time() - start_time) / n_iter
print('z_np.shape', z_np.shape)
print('avr_time', round(avr_time, 3))
n_total_params = 0
for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='var_scope'):
n_total_params += np.prod(v.get_shape().as_list())
print('Number of parameters:', format(n_total_params, ',d'))
if __name__ == '__main__':
run_graph_part(conv_block, spatial_size=128, n_channels=32, n_iter=100)
推荐答案
您的主要问题的答案,对张量流图的基准进行基准测试的正确方法是什么?":
An answer to your primary question, 'What is the proper way to benchmark part of tensorflow graph?':
Tensorflow包含一个抽象类,可为tensorflow基准测试提供帮助:
Tensorflow includes an abstract class that provides helpers for tensorflow benchmarks: Benchmark.
因此,可以创建一个Benchmark
对象,并将其用于在张量流图的一部分上执行基准测试.在下面的代码中,将实例化一个基准对象,然后调用run_op_benchmark
方法.将run_op_benchmark
传递给会话,conv_block
张量(在这种情况下),feed_dict
,多次刻录迭代,所需的最小迭代次数,布尔值标志,以保持基准测试还不计算内存使用情况,并且一个方便的名字.该方法返回包含基准测试结果的字典:
So, a Benchmark
object can be made and used to execute a benchmark on part of a tensorflow graph. In the code below, a benchmark object is instantiated and then, the run_op_benchmark
method is called. run_op_benchmark
is passed the session, the conv_block
Tensor (in this case), a feed_dict
, a number of burn iterations, the desired minimum number of iterations, a boolean flag to keep the benchmark from also computing memory usage and a convenient name. The method returns a dictionary containing the benchmark results:
benchmark = tf.test.Benchmark()
results = benchmark.run_op_benchmark(sess=sess, op_or_tensor=z_tf,
feed_dict={x_tf: x_np}, burn_iters=2,
min_iters=n_iter,
store_memory_usage=False, name='example')
可以如下所示将此代码块插入您的代码中,以比较两个基准测试:
This block of code can be inserted within your code as follows to compare the two benchmarkings:
import os
import time
import numpy as np
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
np.random.seed(2020)
def conv_block(x, kernel_size=3):
# Define some part of graph here
bs, h, w, c = x.shape
in_channels = c
out_channels = c
with tf.compat.v1.variable_scope('var_scope'):
w_0 = tf.get_variable('w_0', [kernel_size, kernel_size, in_channels, out_channels], initializer=tf.keras.initializers.glorot_normal())
x = tf.nn.conv2d(x, w_0, [1, 1, 1, 1], 'SAME')
return x
def get_data_batch(spatial_size, n_channels):
bs = 1
h = spatial_size
w = spatial_size
c = n_channels
x_np = np.random.rand(bs, h, w, c)
x_np = x_np.astype(np.float32)
#print('x_np.shape', x_np.shape)
return x_np
def run_graph_part(f_name, spatial_size, n_channels, n_iter=100):
print('=' * 60)
print(f_name.__name__)
tf.reset_default_graph()
with tf.Session() as sess:
x_tf = tf.placeholder(tf.float32, [1, spatial_size, spatial_size, n_channels], name='input')
z_tf = f_name(x_tf)
sess.run(tf.global_variables_initializer())
x_np = get_data_batch(spatial_size, n_channels)
start_time = time.time()
for _ in range(n_iter):
z_np = sess.run(fetches=[z_tf], feed_dict={x_tf: x_np})[0]
avr_time = (time.time() - start_time) / n_iter
print('z_np.shape', z_np.shape)
print('avr_time', round(avr_time, 3))
n_total_params = 0
for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='var_scope'):
n_total_params += np.prod(v.get_shape().as_list())
print('Number of parameters:', format(n_total_params, ',d'))
# USING TENSORFLOW BENCHMARK
benchmark = tf.test.Benchmark()
results = benchmark.run_op_benchmark(sess=sess, op_or_tensor=z_tf,
feed_dict={x_tf: x_np}, burn_iters=2, min_iters=n_iter,
store_memory_usage=False, name='example')
return results
if __name__ == '__main__':
results = run_graph_part(conv_block, spatial_size=128, n_channels=32, n_iter=100)
这种在tensorflow库本身中的基准测试类的实现提供了有关其他问题答案的提示.由于tensorflow实现不需要为每个基准迭代使用新的feed_dict
,因此似乎可以回答问题1)'循环中使用的x_np
是否可以相同还是我需要重新生成它?每一次?'可以在每个循环中使用相同的x_np
.关于问题2),似乎确实需要一些热身". tensorflow库实现建议的默认刻录迭代次数为2.关于问题3), run_op_benchmark
(源).有趣的是,tensorflow基准测试实现报告的是中值而不是操作时间的平均值(可能是为了使基准测试对异常值的鲁棒性更高).
This implementation of a benchmarking class within the tensorflow library itself provides hints as to the answers to your other questions. As the tensorflow implementation does not necessitate use of a new feed_dict
for each benchmark iteration, it would appear that the answer to question 1) 'Is it ok that x_np
used in the loop is the same or I need to regenerate it each time?' is that it is OK to use the same x_np
each loop. In regards to question 2), it does appear that some 'warm up' is necessary. The default number of burn iterations suggested by the tensorflow library implementation is 2. In regards to question 3), timeit
is an excellent tool for measuring execution time of small code snippets. However, the tensorflow library itself uses time.time()
in a similar manner to what you have done: run_op_benchmark
(source). Interestingly, the tensorflow benchmark implementation reports back the median rather than the mean of the operation walltimes (presumably to make the benchmark more robust to outliers).
这篇关于基准张量流图的正确方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!