如何从 TFRecordData 取回原始字符串数据 [英] How to get original string data back from TFRecordData

查看:43
本文介绍了如何从 TFRecordData 取回原始字符串数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我按照 Tensorflow 指南使用以下方法保存我的字符串数据:

I followed Tensorflow guide to save my string data using:

def _create_string_feature(values):
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[values.encode('utf-8')]))

我还使用了 ["tf.string", "FixedLenFeature"] 作为我的特征原始类型,以及 "tf.string" 作为我的特征转换类型.

I also used ["tf.string", "FixedLenFeature"] as my feature original type, and "tf.string" as my feature convert type.

但是,在训练期间,当我运行会话并创建迭代器时,批量大小为 2 的字符串特征(例如:['foodfruit', 'cupcake food'])将如下所示.问题是这个列表的大小是 1,而不是 2(batch_size=2),为什么一批中的实例粘在一起而不是分裂?

However, during my training when I run my session and I create iterators, my string feature for a batch size of 2 (for example: ['food fruit', 'cupcake food' ]) would be like below. The problem is that this list is of size 1, and not 2 (batch_size=2), why instances in one batch are stick together rather than being splitted?

[b'food fruit' b'cupcake food']

对于我的其他 int 或 float 特征,它们是凹凸不平的形状数组 (batch_size, feature_len),这很好,但不确定为什么字符串特征没有在单个批次中分开?

For my other features which are int or float, they are bumpy arrays of shape (batch_size, feature_len) which are fine but not sure why string features are not separated in a single batch?

任何帮助将不胜感激.

推荐答案

这会将 BytesListbytes_list 字符串对象转换为字符串:

This will convert a BytesList or bytes_list string object to a string:

my_bytes_list_object.value[0].decode()

或者,如果从 TFRecord Example 对象中提取字符串:

Or, in the case one is extracting the string from a TFRecord Example object:

my_example.features.feature['MyFeatureName'].bytes_list.value[0].decode()

据我所知,bytes_list 返回一个 BytesList 对象,我们可以从中读取 value 字段.这将返回一个 RepeatedScalarContainer,它的操作就像一个简单的 list 对象.事实上,如果你用 list() 操作把它包装起来,它就会把它转换成一个列表.但是,我们可以像访问列表一样访问它,并使用 [0] 来获取第零个项目.返回的项是一个 bytes 数组,可以使用 decode() 方法将其转换为标准的 str 对象.

From what I can see, bytes_list returns a BytesList object, from which we can read the value field. This will return a RepeatedScalarContainer, which operates like a simple list object. In fact, if you wrap it with the list() operation it will convert it to a list. However, instead we can just access it as if it were a list and use [0] to get the zeroth item. The returned item is a bytes array, which can be converted to a standard str object with the decode() method.

这篇关于如何从 TFRecordData 取回原始字符串数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆