如何从 TFRecordData 取回原始字符串数据 [英] How to get original string data back from TFRecordData
问题描述
我按照 Tensorflow 指南使用以下方法保存我的字符串数据:
I followed Tensorflow guide to save my string data using:
def _create_string_feature(values):
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[values.encode('utf-8')]))
我还使用了 ["tf.string", "FixedLenFeature"]
作为我的特征原始类型,以及 "tf.string"
作为我的特征转换类型.
I also used ["tf.string", "FixedLenFeature"]
as my feature original type, and "tf.string"
as my feature convert type.
但是,在训练期间,当我运行会话并创建迭代器时,批量大小为 2 的字符串特征(例如:['foodfruit', 'cupcake food'])将如下所示.问题是这个列表的大小是 1,而不是 2(batch_size=2),为什么一批中的实例粘在一起而不是分裂?
However, during my training when I run my session and I create iterators, my string feature for a batch size of 2 (for example: ['food fruit', 'cupcake food' ]) would be like below. The problem is that this list is of size 1, and not 2 (batch_size=2), why instances in one batch are stick together rather than being splitted?
[b'food fruit' b'cupcake food']
对于我的其他 int 或 float 特征,它们是凹凸不平的形状数组 (batch_size, feature_len),这很好,但不确定为什么字符串特征没有在单个批次中分开?
For my other features which are int or float, they are bumpy arrays of shape (batch_size, feature_len) which are fine but not sure why string features are not separated in a single batch?
任何帮助将不胜感激.
推荐答案
这会将 BytesList
或 bytes_list
字符串对象转换为字符串:
This will convert a BytesList
or bytes_list
string object to a string:
my_bytes_list_object.value[0].decode()
或者,如果从 TFRecord Example 对象中提取字符串:
Or, in the case one is extracting the string from a TFRecord Example object:
my_example.features.feature['MyFeatureName'].bytes_list.value[0].decode()
据我所知,bytes_list
返回一个 BytesList
对象,我们可以从中读取 value
字段.这将返回一个 RepeatedScalarContainer
,它的操作就像一个简单的 list
对象.事实上,如果你用 list()
操作把它包装起来,它就会把它转换成一个列表.但是,我们可以像访问列表一样访问它,并使用 [0]
来获取第零个项目.返回的项是一个 bytes
数组,可以使用 decode()
方法将其转换为标准的 str
对象.
From what I can see, bytes_list
returns a BytesList
object, from which we can read the value
field. This will return a RepeatedScalarContainer
, which operates like a simple list
object. In fact, if you wrap it with the list()
operation it will convert it to a list. However, instead we can just access it as if it were a list and use [0]
to get the zeroth item. The returned item is a bytes
array, which can be converted to a standard str
object with the decode()
method.
这篇关于如何从 TFRecordData 取回原始字符串数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!