创建一个输出dict的Tensorflow数据集 [英] Creating a tensorflow dataset that outputs a dict

查看:310
本文介绍了创建一个输出dict的Tensorflow数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对我的数据集有一个带有元数据"的字典 {'m1': array_1, 'm2': array_2, ...}.每个数组的形状为(N,...),其中N是样本数.

I have a dict with "metadata" for my dataset, of sort {'m1': array_1, 'm2': array_2, ...}. Each of the arrays has shape (N, ...), where N is the number of samples.

问题: 是否可以创建 tf.data.Dataset 并输出数据集iterator.get_next()的每次迭代的字典{'meta_1': sub_array_1, 'meta_2': sub_array_2, ...}?在这里,sub_array_i应该包含一个批处理的第i个元数据,因此应该具有形状(batch_sz,...).

The question: Is it possible to create a tf.data.Dataset that outputs a dictionary {'meta_1': sub_array_1, 'meta_2': sub_array_2, ...} for each iteration of the datasets iterator.get_next()? Here, sub_array_i should contain the ith metadata for one batch, so should have shape (batch_sz, ...).

到目前为止,我尝试使用的是 tf.data.Dataset. from_generator(),如下所示:

What I tried so far is using tf.data.Dataset.from_generator(), like this:

N = 100
# dictionary of arrays:
metadata = {'m1': np.zeros(shape=(N,2)), 'm2': np.ones(shape=(N,3,5))} 
num_samples = N

def meta_dict_gen():
    for i in range(num_samples):
        ls = {}
        for key, val in metadata.items():
            ls[key] = val[i]
        yield ls

dataset = tf.data.Dataset.from_generator(meta_dict_gen, output_types=(dict))

与此相关的问题似乎在output_types=(dict)中.上面的代码向我抛出

The problem with this seems to be in output_types=(dict). The code above throws at me a

TypeError:参数'Tout'的预期数据类型不<类"dict">.

TypeError: Expected DataType for argument 'Tout' not < class 'dict'>.


我正在使用tensorflow 1.8和python 3.6.


I'm using tensorflow 1.8 and python 3.6.

推荐答案

所以实际上可以按照您的意愿去做,只需要具体说明dict的内容即可:

So actually it is possible to do what you intend, you just have to be specific about the contents of the dict:

import tensorflow as tf
import numpy as np

N = 100
# dictionary of arrays:
metadata = {'m1': np.zeros(shape=(N,2)), 'm2': np.ones(shape=(N,3,5))}
num_samples = N

def meta_dict_gen():
    for i in range(num_samples):
        ls = {}
        for key, val in metadata.items():
            ls[key] = val[i]
        yield ls

dataset = tf.data.Dataset.from_generator(
    meta_dict_gen,
    output_types={k: tf.float32 for k in metadata},
    output_shapes={'m1': (2,), 'm2': (3, 5)})
iter = dataset.make_one_shot_iterator()
next_elem = iter.get_next()
print(next_elem)

输出:

{'m1': <tf.Tensor 'IteratorGetNext:0' shape=(2,) dtype=float32>,
 'm2': <tf.Tensor 'IteratorGetNext:1' shape=(3, 5) dtype=float32>}

这篇关于创建一个输出dict的Tensorflow数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆