循环 tf.data.Dataset 非常慢 [英] Looping over tf.data.Dataset very slow

查看:66
本文介绍了循环 tf.data.Dataset 非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道为什么对 tf.data.Dataset 样本的 for 循环比在相应的 numpy 数组上循环慢得多.

I'm wondering why a for-loop over samples of a tf.data.Dataset is so much slower than looping over the corresponding numpy array.

import numpy as np
import tensorflow as tf
import time

a = np.ones(100000, dtype=np.float32)

start_time = time.time()
for x in a:
    pass
print(time.time() - start_time)

start_time = time.time()
for x in tf.data.Dataset.from_tensor_slices(a):
    pass
print(time.time() - start_time)

<小时>

0.05548405647277832
5.67711615562439

我的 TensorFlow 版本是 2.0.0.

My TensorFlow version is 2.0.0.

推荐答案

是的,即使我观察到了相同的行为.为了提高速度/性能,尝试将 tf.data.dataset 包装在 @tf.function 并且几乎需要相同的时间.

Yes, even i have observed same behavior. To improve speed/performance try wrapping tf.data.dataset in a @tf.function and it will take almost the same time.

AutoGraphtf.function 中是默认的并将您的 Python 热切代码转换为与图形兼容的 TensorFlow 操作.这包括控制流,如 ifforwhile.

AutoGraph is on default in tf.function and transforms your Python eager code into graph-compatible TensorFlow ops. This includes control flow like if, for, while.

tf.function 最适合 TensorFlow ops,NumPy 和 Python 调用被转换为常量.

tf.function works best with TensorFlow ops, NumPy and Python calls are converted to constants.

请参考下面显示的代码包裹在@tf.function

Please refer code shown below to wrap within @tf.function

@tf.function

def oper(a):
    start_time = time.time()
    for x in tf.data.Dataset.from_tensor_slices(a):
        pass
    print(time.time() - start_time)

numpytf.data.dataset 性能之间的完整工作代码如下所示

Complete working code shown below between numpy and tf.data.dataset performance

import numpy as np
import tensorflow as tf
import time

a = np.ones(100000, dtype=np.float32)

start_time = time.time()
for x in a:
    pass
print(time.time() - start_time)


@tf.function

def oper(a):
    start_time = time.time()

    for x in tf.data.Dataset.from_tensor_slices(a):
        pass
    print(time.time() - start_time)

oper(a) 

输出:

0.012496232986450195
0.017792224884033203

要了解有关 tf.function 的更多信息,请参阅this.

To know more about tf.function please refer this.

这篇关于循环 tf.data.Dataset 非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆