在 Tensorflow 的 Dataset API 中,如何将一个元素映射到多个元素? [英] In Tensorflow's Dataset API how do you map one element into multiple elements?

查看:34
本文介绍了在 Tensorflow 的 Dataset API 中,如何将一个元素映射到多个元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 tensorflow Dataset 管道中,我想定义一个自定义映射函数,它接受单个输入元素(数据样本)并返回多个元素(数据样本).

In the tensorflow Dataset pipeline I'd like to define a custom map function which takes a single input element (data sample) and returns multiple elements (data samples).

下面的代码是我的尝试,以及想要的结果.

The code below is my attempt, along with the desired results.

我无法很好地理解 tf.data.Dataset().flat_map() 上的文档,无法理解它是否适​​用于此.

I could not follow the documentation on tf.data.Dataset().flat_map() well enough to understand if it was applicable here or not.

import tensorflow as tf

input = [10, 20, 30]

def my_map_func(i):
  return [[i, i+1, i+2]]       # Fyi [[i], [i+1], [i+2]] throws an exception

ds = tf.data.Dataset.from_tensor_slices(input)
ds = ds.map(map_func=lambda input: tf.py_func(
  func=my_map_func, inp=[input], Tout=[tf.int64]
))
element = ds.make_one_shot_iterator().get_next()

with tf.Session() as sess:
  for _ in range(9):
    print(sess.run(element))

结果:

(array([10, 11, 12]),)
(array([20, 21, 22]),)
(array([30, 31, 32]),)

预期结果:

(10)
(11)
(12)
(20)
(21)
(22)
(30)
(31)
(32)

推荐答案

要实现这一点,还需要两个步骤.首先,map 函数需要返回一个 numpy 数组,而不是一个列表.

Two more steps were required to achieve this. First, the map function needs to return a numpy array, not a list.

然后你可以使用 flat_map 结合 Dataset().from_tensor_slices() 来压平它们.下面的代码现在产生了想要的结果:

Then you can use flat_map combined with Dataset().from_tensor_slices() to flatten them. The code below now produces the desired result:

在 Tensorflow 1.5 中测试(复制/粘贴可运行示例)

Tested in Tensorflow 1.5 (copy/paste runnable example)

import tensorflow as tf
import numpy as np

input = [10, 20, 30]

def my_map_func(i):
  return np.array([i, i + 1, i + 2])

ds = tf.data.Dataset.from_tensor_slices(input)
ds = ds.map(map_func=lambda input: tf.py_func(
  func=my_map_func, inp=[input], Tout=[tf.int64]
))
ds = ds.flat_map(lambda x: tf.data.Dataset().from_tensor_slices(x))

element = ds.make_one_shot_iterator().get_next()

with tf.Session() as sess:
  for _ in range(9):
    print(sess.run(element))

<小时>

如果您有多个变量要返回,这里是一种执行此操作的方法,在此示例中,我输入一个字符串(例如文件名)并输出字符串和整数的倍数.在这种情况下,我为 [10, 20, 30] 的每个整数重复该字符串.


Here is a method of doing this if you have multiple variables to return, in this example I input a string (such as a filename) and output multiples of both strings and integers. In this case I repeat the string for each of the integers of [10, 20, 30].

复制/粘贴可运行示例:

Copy/paste runnable example:

import tensorflow as tf
import numpy as np

input = [b'testA', b'testB', b'testC']

def my_map_func(input):
  return np.array([input, input, input]), np.array([10, 20, 30])

ds = tf.data.Dataset.from_tensor_slices(input)
ds = ds.map(map_func=lambda input: tf.py_func(
    func=my_map_func, inp=[input], Tout=[tf.string, tf.int64]))
ds = ds.flat_map(lambda mystr, myint: tf.data.Dataset().zip((
  tf.data.Dataset().from_tensor_slices(mystr),
  tf.data.Dataset().from_tensor_slices(myint))
))

element = ds.make_one_shot_iterator().get_next()

with tf.Session() as sess:
  for _ in range(9):
    print(sess.run(element))

这篇关于在 Tensorflow 的 Dataset API 中,如何将一个元素映射到多个元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆