如何将 numpy 数组存储为 tfrecord? [英] how to store numpy arrays as tfrecord?
问题描述
我正在尝试从 numpy 数组创建 tfrecord 格式的数据集.我正在尝试存储 2d 和 3d 坐标.
I am trying to create a dataset in tfrecord format from numpy arrays. I am trying to store 2d and 3d coordinates.
2d 坐标是 float64 类型的形状 (2,10) 的 numpy 数组3d 坐标是 float64 类型的形状 (3,10) 的 numpy 数组
2d coordinates are numpy array of shape (2,10) of type float64 3d coordinates are numpy array of shape (3,10) of type float64
这是我的代码:
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
train_filename = 'train.tfrecords' # address to save the TFRecords file
writer = tf.python_io.TFRecordWriter(train_filename)
for c in range(0,1000):
#get 2d and 3d coordinates and save in c2d and c3d
feature = {'train/coord2d': _floats_feature(c2d),
'train/coord3d': _floats_feature(c3d)}
sample = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(sample.SerializeToString())
writer.close()
当我运行这个时,我收到错误:
when i run this i get the error:
feature = {'train/coord2d': _floats_feature(c2d),
File "genData.py", line 19, in _floats_feature
return tf.train.Feature(float_list=tf.train.FloatList(value=value))
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\google\protobuf\internal\python_message.py", line 510, in init
copy.extend(field_value)
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\google\protobuf\internal\containers.py", line 275, in extend
new_values = [self._type_checker.CheckValue(elem) for elem in elem_seq_iter]
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\google\protobuf\internal\containers.py", line 275, in <listcomp>
new_values = [self._type_checker.CheckValue(elem) for elem in elem_seq_iter]
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\google\protobuf\internal\type_checkers.py", line 109, in CheckValue
raise TypeError(message)
TypeError: array([-163.685, 240.818, -114.05 , -518.554, 107.968, 427.184,
157.418, -161.798, 87.102, 406.318]) has type <class 'numpy.ndarray'>, but expected one of: ((<class 'numbers.Real'>,),)
我不知道如何解决这个问题.我应该将特征存储为 int64 还是字节?我不知道如何解决这个问题,因为我对 tensorflow 完全陌生.任何帮助都会很棒!谢谢
I dont know how to fix this. should i store the features as int64 or bytes? I have no clue how to go about this since i am completely new to tensorflow. any help would be great! thanks
推荐答案
tf.train.Feature
类仅支持列表(或一维数组) 使用 float_list
参数时.根据您的数据,您可以尝试以下方法之一:
The tf.train.Feature
class only supports lists (or 1-D arrays) when using the float_list
argument. Depending on your data, you might try one of the following approaches:
在将数组中的数据传递给
tf.train.Feature
之前将其展平:
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value.reshape(-1)))
请注意,您可能需要添加另一个功能来指示在再次解析数据时应如何重新调整数据(为此您可以使用 int64_list
功能).
Note that you might need to add another feature to indicate how this data should be reshaped when you parse it again (and you could use an int64_list
feature for that purpose).
将多维特征拆分为多个一维特征.例如,如果 c2d
包含一个包含 x 和 y 坐标的 N * 2
数组,您可以将该特征拆分为单独的 train/coord2d/x
和 train/coord2d/y
特征,每个特征分别包含 x 和 y 坐标数据.
Split the multidimensional feature into multiple 1-D features. For example, if c2d
contains an N * 2
array of x- and y-coordinates, you could split that feature into separate train/coord2d/x
and train/coord2d/y
features, each containing the x- and y-coordinate data, respectively.
这篇关于如何将 numpy 数组存储为 tfrecord?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!