如何使用字典映射 tf.data.Dataset 中的值 [英] How can you map values in a tf.data.Dataset using a dictionary

查看:58
本文介绍了如何使用字典映射 tf.data.Dataset 中的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个所需映射的简单用例.将整数标签映射到 one-hot 编码.我想提一下,对于这种特殊情况,应该使用 tf.one_hot.但我想了解如何使用字典映射数据集.

Here is a simple use-case of a desired mapping. To map integer labels to one-hot encodings. I would like to mention that for this particular case one should use tf.one_hot. But I want to understand how you could map a dataset using a dictionary anyway.

import tensorflow as tf
import numpy as np

#CREATE A ONE-HOT ENCODING MAPPING
mike_labels = [164, 117, 132, 37, 66, 177, 225, 33, 28, 75, 7]
num_classes = len(mike_labels)
one_hots = np.eye(len(mike_labels))
one_hots = one_hots.tolist()
#used to convert labels to corresponding one-hot encoding
label_encoder = {orig: onehot for orig, onehot in zip(mike_labels, 
one_hots)}
print (label_encoder[164])
print (label_encoder[28])

#CREATE A FAKE DATASET
raw_data = [[164],[28],[132],[7]]
dataset = tf.data.Dataset.from_generator(lambda: raw_data, tf.float32, output_shapes=[None])

iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
    print(sess.run(next_element))
    print(sess.run(next_element))

代码打印出 4 个值.第一个是直接从字典中获取的所需的 one-hot 编码.后两个打印值是数据集中的前 2 个值.每个元素都显示为一个包含单个浮点数的列表.

The code prints out 4 values. The first are the desired one-hot encodings directly taken from the dictionary. The second two printed values are the first 2 values in the dataset. Each element is shown to be a list containing a single float.

[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]
[ 164.]
[ 28.]

理想的答案将展示如何将数据集中的所有值更改为字典中相应的 one-hot 编码,使用提供的字典并且不会使用 <代码>tf.one_hot.

The ideal answer will show how to change all values in the dataset to their corresponding one-hot encodings in the dictionary, USING the provided dictionary and WILL NOT use tf.one_hot.

推荐答案

可以使用 lambda 函数映射标签.dataset.map 函数为数据集的每个元素调用该函数.映射中的 lambda 函数将使用 tf.py_func 调用另一个函数.

The labels can be mapped using a lambda function. The dataset.map function calls the function for each element of the dataset. The lambda function in the mapping will call another function using tf.py_func.

tf.py_func 允许将张量视为 np 数组,因为张量无法提供给字典.该函数的返回值将是一个浮点数列表,tf.py_func 需要每个浮点数的数据类型,因此通过列表推导给出:

tf.py_func allows the tensors to be treated as np arrays since the tensor cannot be fed to the dictionary. The return value of the function will be a list of floats, tf.py_func needs the datatype of each so this is given with a list comprehension:

dataset = dataset.map(lambda label: tf.py_func(practice_py_func, [label], [tf.float32 for i in range(num_classes)]))

将调用以下函数.首先,我们从接收到的 numpy 数组中获取一个列表.此列表包含单个元素(标签).因此,我们取位置 0 处的元素并使用字典找到相应的 one-hot 编码.由于 tensorflow 似乎抛出一个奇怪的错误,即接收到的值是双精度数而不是预期的浮点数,因此我们将其转换为 float32.然后返回 one-hot 编码.

The following function will be called. First we obtain a list from the received numpy array. This list contains a single element (the label). As such, we take the element at position 0 and use the dictionary to find the corresponding one-hot encoding. As tensorflow seems to throw a strange error that the received value is a double rather than the expected float, we then cast it to float32. The one-hot encoding is then returned.

def practice_py_func(arg1):
    temp = arg1.tolist() #convert the numpy array to a list
    l = label_encoder[temp[0]] #look up the encoding in the dictionary
    output = [np.float32(val) for val in l] #convert each value in the encoding to a float
    return output

整个解决方案如下所示:

The whole solution looks like this:

import tensorflow as tf
import numpy as np

#CREATE A ONE-HOT ENCODING MAPPING
mike_labels = [164, 117, 132, 37, 66, 177, 225, 33, 28, 75, 7]
num_classes = len(mike_labels)
one_hots = np.eye(len(mike_labels))
one_hots = one_hots.tolist()
#used to convert labels to corresponding one-hot encoding
label_encoder = {orig: onehot for orig, onehot in zip(mike_labels, one_hots)}
print (label_encoder[164])
print (label_encoder[28])

#CREATE A FAKE DATASET
raw_data = [[164],[28],[132],[7]]
dataset = tf.data.Dataset.from_generator(lambda: raw_data, tf.float32, output_shapes=[None])


def practice_py_func(arg1):
    temp = arg1.tolist() #convert the numpy array to a list
    l = label_encoder[temp[0]] #look up the encoding in the dictionary
    output = [np.float32(val) for val in l] #convert each value in the encoding to a float
    return output

dataset = dataset.map(lambda label: tf.py_func(practice_py_func, [label], [tf.float32 for i in range(num_classes)]))


iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
    print(sess.run(next_element))
    print(sess.run(next_element))

这篇关于如何使用字典映射 tf.data.Dataset 中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆