tensorflow:从 TFRecord 读取时间序列数据 [英] tensorflow: Reading time series data from TFRecord

查看:47
本文介绍了tensorflow:从 TFRecord 读取时间序列数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 SequenceExample protobuf 将时间序列数据读/写到 TFRecord 文件中.

I'm using a SequenceExample protobuf to read/write time-series data into a TFRecord file.

我将一对 np 数组序列化如下:

I serialized a pair the np arrays as follows:

writer = tf.python_io.TFRecordWriter(file_name)

context = tf.train.Features( ... Feature( ... ) ... )

feature_data = tf.train.FeatureList(feature=[
                  tf.train.Feature(float_list=tf.train.FloatList(value=
                                   np.random.normal(size=([4065000,]))])
labels = tf.train.FeatureList(feature=[
                  tf.train.Feature(int64_list=tf.train.Int64List(value=
                           np.random.random_integers(0,10,size=([1084,]))])

##feature_data and labels are of similar, but varying lengths

feature_list = {"feature_data": feature_data,
                "labels": labels}

feature_lists = tf.train.FeatureLists(feature_list=feature_list)
example = tf.train.SequenceExample(context=context,
                                   feature_lists=feature_lists)

        ## serialize and close

在尝试读取 .tfrecords 文件时,我遇到了很多错误,主要是因为 SequenceExample protobuf 将时间序列数据写为一系列值(例如值:-12.2549,值:-18.1372,.... 值:13.1234).我读取 .tfrecords 文件的代码如下:

When trying to read the .tfrecords file, I've gotten quite a few errors, primarily because the SequenceExample protobuf writes the time series data as a series of values (e.g. value: -12.2549, value: -18.1372, .... value:13.1234). My code to read the .tfrecords file is as follows:

dataset = tf.data.TFRecordDataset("data/tf_record.tfrecords")
dataset = dataset.map(decode)
dataset = dataset.make_one_shot_iterator().get_next()

### reshape tensors and feed to estimator###

我的 decode() 函数定义如下:

My decode() function is defined as follows:

def decode(serialized_proto):
    context_features = {...}
    sequence_features = {"feature_data": tf.FixedLenSequenceFeature((None,), 
                                                                tf.float32),
                         "labels": tf.FixedLenSequenceFeature(((None,), 
                                                                 tf.int64)}

    context, sequence = tf.parse_single_sequence_example(serialized_proto,
                                        context_features=context_features,
                                        sequence_features=sequence_features)

    return context, sequence

其中一个错误如下:

Shape [?] 没有完全定义为 'ParseSingleSequenceExample/ParseSingleSequenceExample'(操作:'ParseSingleSequenceExample'),输入形状:[], [0], [], [], [], [],[], [], [].

我的主要问题是如何思考数据集的结构.我不确定我是否真的理解返回数据的结构.我很难遍历这个数据集并返回可变大小的张量.提前致谢!

My primary question is how to think about the structure of Datasets. I'm not sure I really understand the structure of the data returned. I'm having a hard time iterating through this Dataset and returning the variably-sized Tensors. Thanks in advance!

推荐答案

当特征的形状已知时,您只能使用 tf.FixedLenSequenceFeature.否则,请改用 tf.VarLenFeature.

you can only use tf.FixedLenSequenceFeature when the shape of the feature is known. Otherwise, use tf.VarLenFeature instead.

这篇关于tensorflow:从 TFRecord 读取时间序列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆