Tensorflow 执行时间 [英] Tensorflow execution time

查看:96
本文介绍了Tensorflow 执行时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Python 脚本中有一个函数,我多次调用该函数 (https://github.com/sankhaMukherjee/NNoptExpt/blob/dev/src/lib/NNlib/NNmodel.py):对于这个例子,我已经大大简化了函数.

I have a function within a Python script that I am calling multiple times (https://github.com/sankhaMukherjee/NNoptExpt/blob/dev/src/lib/NNlib/NNmodel.py): I have simplified the function significantly for this example.

def errorValW(self, X, y, weights):

    errVal = None

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        nW = len(self.allW)
        W = weights[:nW] 
        B = weights[nW:]

        for i in range(len(W)):
            sess.run(tf.assign( self.allW[i], W[i] ))

        for i in range(len(B)):
            sess.run(tf.assign( self.allB[i], B[i] ))

        errVal = sess.run(self.err, 
            feed_dict = {self.Inp: X, self.Op: y})

    return errVal

我从另一个函数多次调用这个函数.当我看到程序日志时,似乎这个功能持续的时间越来越长.显示部分日志:

I am calling this function many times from another function. When I see the program log, It appears that this function keeps taking longer and longer. A partial log is shown:

21:37:12,634 - ... .errorValW ... - Finished the function [errorValW] in 1.477610e+00 seconds
21:37:14,116 - ... .errorValW ... - Finished the function [errorValW] in 1.481470e+00 seconds
21:37:15,608 - ... .errorValW ... - Finished the function [errorValW] in 1.490914e+00 seconds
21:37:17,113 - ... .errorValW ... - Finished the function [errorValW] in 1.504651e+00 seconds
21:37:18,557 - ... .errorValW ... - Finished the function [errorValW] in 1.443876e+00 seconds
21:37:20,183 - ... .errorValW ... - Finished the function [errorValW] in 1.625608e+00 seconds
21:37:21,719 - ... .errorValW ... - Finished the function [errorValW] in 1.534915e+00 seconds
... many lines later  
22:59:26,524 - ... .errorValW ... - Finished the function [errorValW] in 9.576592e+00 seconds
22:59:35,991 - ... .errorValW ... - Finished the function [errorValW] in 9.466405e+00 seconds
22:59:45,708 - ... .errorValW ... - Finished the function [errorValW] in 9.716456e+00 seconds
22:59:54,991 - ... .errorValW ... - Finished the function [errorValW] in 9.282923e+00 seconds
23:00:04,407 - ... .errorValW ... - Finished the function [errorValW] in 9.415035e+00 seconds

有没有其他人遇到过这样的事情??这对我来说完全莫名其妙......

Has anyone else experienced anything like this?? This is totally baffling to me ...

编辑:这是供参考...

作为参考,类的初始值设定项如下所示.我怀疑 result 变量的图形在大小上逐渐增加.当我尝试使用 tf.train.Saver(tf.trainable_variables()) 保存模型时,我已经看到了这个问题,并且这个文件的大小不断增加.我不确定我是否在以任何方式定义模型时犯了错误......

For reference, the initializer for the class is shown below. I suspect that the graph for the result variable is progressively increasing in size. I have seen this problem when I try to save models with tf.train.Saver(tf.trainable_variables()) and the size of this file keeps increasing. I am not sure if I am making a mistake in defining the model in any way ...

def __init__(self, inpSize, opSize, layers, activations):

    self.inpSize = inpSize
    self.Inp     = tf.placeholder(dtype=tf.float32, shape=inpSize, name='Inp')
    self.Op      = tf.placeholder(dtype=tf.float32, shape=opSize, name='Op')

    self.allW    = []
    self.allB    = []

    self.result  = None

    prevSize = inpSize[0]
    for i, l in enumerate(layers):
        tempW = tf.Variable( 0.1*(np.random.rand(l, prevSize) - 0.5), dtype=tf.float32, name='W_{}'.format(i) )
        tempB = tf.Variable( 0, dtype=tf.float32, name='B_{}'.format(i) )

        self.allW.append( tempW )
        self.allB.append( tempB )

        if i == 0:
            self.result = tf.matmul( tempW, self.Inp ) + tempB
        else:
            self.result = tf.matmul( tempW, self.result ) + tempB

        prevSize = l

        if activations[i] is not None:
            self.result = activations[i]( self.result )

    self.err = tf.sqrt(tf.reduce_mean((self.Op - self.result)**2))


    return

推荐答案

您正在会话上下文中调用 tf.assign.这将在您每次执行 errorValW 函数时不断向您的图表添加操作,随着您的图表变大而减慢执行速度.根据经验,在对数据执行模型时应该避免调用 Tensorflow ops(因为这通常会在循环内,导致图的不断增长).根据我的个人经验,即使您在执行期间只添加了少数"操作,这也可能导致速度极度减慢.

You are calling tf.assign in the the session context. This will keep adding ops to your graph every time you execute the errorValW function, slowing down execution as your graph grows larger. As a rule of thumb, you should avoid ever calling Tensorflow ops when executing models on data (since this will usually be inside a loop, resulting in constant growth of the graph). From my personal experience, even if you are only adding "a few" ops during execution time this can result in extreme slowdown.

请注意,tf.assign 与其他操作一样.您应该事先定义一次(在创建模型/构建图形时),然后在启动会话后重复运行相同的操作.

Note that tf.assign is an op like any other. You should define it once beforehand (when creating the model/building the graph) and then run the same op repeatedly after launching the session.

我不知道您在代码片段中究竟想实现什么,但请考虑以下几点:

I don't know what exactly you are trying to achieve in your code snippet, but consider the following:

...
with tf.Session() as sess:
    sess.run(tf.assign(some_var, a_value))

可以换成

a_placeholder = tf.placeholder(type_for_a_value, shape_for_a_value)
assign_op = tf.assign(some_var, a_placeholder)
...
with tf.Session() as sess:
    sess.run(assign_op, feed_dict={a_placeholder: a_value})

其中 a_placeholder 应该与 some_var 具有相同的 dtype/shape.我必须承认我还没有测试过这个片段,所以如果有问题请告诉我,但这应该是正确的.

where a_placeholder should have the same dtype/shape as some_var. I have to admit I haven't tested this snippet so please let me know if there are issues, but this should be about right.

这篇关于Tensorflow 执行时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆