tensorflow py_func很方便,但是使我的训练步骤非常缓慢. [英] tensorflow py_func is handy but makes my training step very slow.

查看:516
本文介绍了tensorflow py_func很方便,但是使我的训练步骤非常缓慢.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用tensorflow函数py_func有效率问题.

上下文

在我的项目中,我有一批张量input_features的张量input_features.第一维设置为?,因为它是动态形状(使用自定义的tensorflow阅读器读取批处理,并使用tf.train.shuffle_batch_join()进行混洗).第二个维度对应一个上限(我可以在示例中使用的最大项目数),第三个维度对应于要素维度空间.我还有一个张量num_items,其具有批处理大小的尺寸(因此形状为(?,)),指示示例中的项目数,其他设置为0(以numpy书写样式input_feature[k, num_items[k]:, :] = 0)

问题

我的工作流需要一些自定义的python操作(尤其是处理索引时,我需要实例来对某些示例块执行聚类操作),并且我使用了一些包裹在py_func函数中的numpy函数.这很好用,但是训练变得非常慢(比没有py_func的模型慢50倍左右),并且函数本身并不耗时.

问题

1-此计算时间增加正常吗?包装在py_func中的函数为我提供了一个新的张量,该张量在此过程中进一步相乘.它可以解释计算时间吗? (我的意思是,使用这种函数计算梯度可能会更加困难.)

2-我试图修改我的处理过程,并避免使用py_func函数.但是,使用numpy索引提取数据非常方便(尤其是使用我的数据格式),而且以TF方式传递它有些困难.例如,如果我的张量t1的形状为[-1, n_max, m](第一维为batch_size是动态的),并且为t2的形状为[-1,2],其中包含整数.是否有一种简单的方法来在张量流中执行均值运算,从而导致形状为(-1, m)t_mean_chunk其中(以小数形式表示): t_mean_chunk[i,:] = np.mean(t1[i, t2[i,0]:t2[i,1], :], axis=0)? (除其他操作外)这是我在包装函数中所做的事情.

解决方案

如果没有确切的py_func,问题1很难回答,但是正如hpaulj在他的评论中提到的那样,它放慢了速度也就不足为奇了.作为最坏情况的后备,使用TensorArraytf.scantf.while_loop可能会更快一些.但是,最好的情况是使用TensorFlow ops进行矢量化解决方案,我认为在这种情况下是可行的.

对于问题2,我不确定它是否算简单,但这是一个计算索引表达式的函数:

import tensorflow as tf

def range_mean(index_ranges, values):
  """Take the mean of `values` along ranges specified by `index_ranges`.

  return[i, ...] = tf.reduce_mean(
    values[i, index_ranges[i, 0]:index_ranges[i, 1], ...], axis=0)

  Args:
    index_ranges: An integer Tensor with shape [N x 2]
    values: A Tensor with shape [N x M x ...].
  Returns:
    A Tensor with shape [N x ...] containing the means of `values` having
    indices in the ranges specified.
  """
  m_indices = tf.range(tf.shape(values)[1])[None]
  # Determine which parts of `values` will be in the result
  selected = tf.logical_and(tf.greater_equal(m_indices, index_ranges[:, :1]),
                            tf.less(m_indices, index_ranges[:, 1:]))
  n_indices = tf.tile(tf.range(tf.shape(values)[0])[..., None],
                      [1, tf.shape(values)[1]])
  segments = tf.where(selected, n_indices + 1, tf.zeros_like(n_indices))
  # Throw out segment 0, since that's our "not included" segment
  segment_sums = tf.unsorted_segment_sum(
      data=values,
      segment_ids=segments, 
      num_segments=tf.shape(values)[0] + 1)[1:]
  divisor = tf.cast(index_ranges[:, 1] - index_ranges[:, 0],
                    dtype=values.dtype)
  # Pad the shape of `divisor` so that it broadcasts against `segment_sums`.
  divisor_shape_padded = tf.reshape(
      divisor,
      tf.concat([tf.shape(divisor), 
                 tf.ones([tf.rank(values) - 2], dtype=tf.int32)], axis=0))
  return segment_sums / divisor_shape_padded

示例用法:

index_range_tensor = tf.constant([[2, 4], [1, 6], [0, 3], [0, 9]])
values_tensor = tf.reshape(tf.range(4 * 10 * 5, dtype=tf.float32), [4, 10, 5])
with tf.Session():
  tf_result = range_mean(index_range_tensor, values_tensor).eval()
  index_range_np = index_range_tensor.eval()
  values_np = values_tensor.eval()

for i in range(values_np.shape[0]):
  print("Slice {}: ".format(i),
        tf_result[i],
        numpy.mean(values_np[i, index_range_np[i, 0]:index_range_np[i, 1], :],
                   axis=0))

打印:

Slice 0:  [ 12.5  13.5  14.5  15.5  16.5] [ 12.5  13.5  14.5  15.5  16.5]
Slice 1:  [ 65.  66.  67.  68.  69.] [ 65.  66.  67.  68.  69.]
Slice 2:  [ 105.  106.  107.  108.  109.] [ 105.  106.  107.  108.  109.]
Slice 3:  [ 170.  171.  172.  173.  174.] [ 170.  171.  172.  173.  174.]

I have some efficiency issue using the tensorflow function py_func.

Context

In my project, I have a batch of tensor input_features of size [? max_items m]. The first dimension is set to ? because it is a dynamic shape (the batch is read for a custom tensorflow reader, and shuffled using tf.train.shuffle_batch_join()). The second dimension correspond to an upper bound (the maximum number of items I can take for my example), the third dimension corresponds to the feature dimension space. I also have a tensor num_items that has dimension of batch size (so the shape is (?,)), indicating the number of items in the example, other are set to 0 (in a numpy writing style input_feature[k, num_items[k]:, :] = 0)

Issue

My workflow need some custom python operations (especially for dealing with indexing, I need or instance to perform clustering operations on some chunk of examples) and I use a few numpy function wrapped in py_func function. This works well, but training becomes very very slow (around 50 times slower than a model without this py_func), and the function itself is not time consuming.

Questions

1 - Is this computing time increase normal? The function wrapped in py_func gives me a new tensor that is multiplied further in the process. Does it explain the computing time? (I mean gradient may be more difficult to compute with such function).

2 - I'm trying to modify my processing and avoid using py_func function. However, it was very handy for extracting data with numpy indexing (especially with my data formatting), and I have some difficulties to pass it in a TF way. For instance, if I have a tensor t1 with shape[-1, n_max, m] (first dimension is batch_size which is dynamic) and t2 with shape [-1,2] containing integers. Is there an easy way to perform mean operation in tensorflow that will results in t_mean_chunk with shape (-1, m) where (in a numpy formulation) : t_mean_chunk[i,:] = np.mean(t1[i, t2[i,0]:t2[i,1], :], axis=0) ? This was (among others operations) the kind of things I was doing in wrapped function.

解决方案

Question 1 is hard to answer without the exact py_func, but as hpaulj mentioned in his comment, it's not too surprising that it's slowing things down. As a worst-case fallback, tf.scan or tf.while_loop with a TensorArray may be somewhat faster. However, the best case is to have a vectorized solution with TensorFlow ops, which I think is possible in this case.

As for question 2, I'm not sure if it counts as easy, but here's a function which computes your indexing expression:

import tensorflow as tf

def range_mean(index_ranges, values):
  """Take the mean of `values` along ranges specified by `index_ranges`.

  return[i, ...] = tf.reduce_mean(
    values[i, index_ranges[i, 0]:index_ranges[i, 1], ...], axis=0)

  Args:
    index_ranges: An integer Tensor with shape [N x 2]
    values: A Tensor with shape [N x M x ...].
  Returns:
    A Tensor with shape [N x ...] containing the means of `values` having
    indices in the ranges specified.
  """
  m_indices = tf.range(tf.shape(values)[1])[None]
  # Determine which parts of `values` will be in the result
  selected = tf.logical_and(tf.greater_equal(m_indices, index_ranges[:, :1]),
                            tf.less(m_indices, index_ranges[:, 1:]))
  n_indices = tf.tile(tf.range(tf.shape(values)[0])[..., None],
                      [1, tf.shape(values)[1]])
  segments = tf.where(selected, n_indices + 1, tf.zeros_like(n_indices))
  # Throw out segment 0, since that's our "not included" segment
  segment_sums = tf.unsorted_segment_sum(
      data=values,
      segment_ids=segments, 
      num_segments=tf.shape(values)[0] + 1)[1:]
  divisor = tf.cast(index_ranges[:, 1] - index_ranges[:, 0],
                    dtype=values.dtype)
  # Pad the shape of `divisor` so that it broadcasts against `segment_sums`.
  divisor_shape_padded = tf.reshape(
      divisor,
      tf.concat([tf.shape(divisor), 
                 tf.ones([tf.rank(values) - 2], dtype=tf.int32)], axis=0))
  return segment_sums / divisor_shape_padded

Example usage:

index_range_tensor = tf.constant([[2, 4], [1, 6], [0, 3], [0, 9]])
values_tensor = tf.reshape(tf.range(4 * 10 * 5, dtype=tf.float32), [4, 10, 5])
with tf.Session():
  tf_result = range_mean(index_range_tensor, values_tensor).eval()
  index_range_np = index_range_tensor.eval()
  values_np = values_tensor.eval()

for i in range(values_np.shape[0]):
  print("Slice {}: ".format(i),
        tf_result[i],
        numpy.mean(values_np[i, index_range_np[i, 0]:index_range_np[i, 1], :],
                   axis=0))

Prints:

Slice 0:  [ 12.5  13.5  14.5  15.5  16.5] [ 12.5  13.5  14.5  15.5  16.5]
Slice 1:  [ 65.  66.  67.  68.  69.] [ 65.  66.  67.  68.  69.]
Slice 2:  [ 105.  106.  107.  108.  109.] [ 105.  106.  107.  108.  109.]
Slice 3:  [ 170.  171.  172.  173.  174.] [ 170.  171.  172.  173.  174.]

这篇关于tensorflow py_func很方便,但是使我的训练步骤非常缓慢.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆