将tf.data.experimental.make_csv_dataset用于时间序列数据 [英] Using tf.data.experimental.make_csv_dataset for time series data

查看:61
本文介绍了将tf.data.experimental.make_csv_dataset用于时间序列数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何对包含时间序列数据的CSV文件使用 tf.data.experimental.make_csv_dataset ?

How do I use tf.data.experimental.make_csv_dataset with CSV files containing time series data?

building_dataset = tf.data.experimental.make_csv_dataset(file_pattern=csv_file,
                                                        batch_size=5,num_epochs=1, shuffle=False,select_columns=feature_columns)

推荐答案

假定CSV文件已通过w.r.t排序.时间.首先,使用以下命令读取CSV文件:

It is assumed that the CSV file is already sorted w.r.t. time. First, read the CSV file using:

building_dataset = tf.data.experimental.make_csv_dataset(file_pattern=csv_file,
                                                        batch_size=5,num_epochs=1, shuffle=False,select_columns=feature_columns)

然后定义一个 pack_features_vector 以转换为特征向量,并使用flat_map()取消批处理.张量也被强制转换为float32.

Then define a pack_features_vector to convert to a features vector and unbatch using flat_map(). The tensors are also cast to float32.

def pack_features_vector(features):
    """Pack the features into a single array."""
    
    features = tf.stack([tf.cast(x,tf.float32) for x in list(features.values())], axis=1)
    return features

   
building_dataset = building_dataset.map(pack_features_vector)
building_dataset = building_dataset.flat_map(lambda x: tf.data.Dataset.from_tensor_slices(x))
for feature in building_dataset.take(1):
    print('Stacked tensor:',feature)

然后使用窗口和平面地图方法.

Then use the window and flat map method.

building_dataset = building_dataset.window(window_size, shift=1, drop_remainder=True)
building_dataset = building_dataset.flat_map(lambda window: window.batch(window_size))

然后使用地图方法来分离特征和标签.

Then use map method to separate features and labels.

building_dataset = building_dataset.map(lambda window: (window[:,:-1], window[-1:,-1]))
for feature, label in building_dataset.take(5):
    print(feature.shape)
    print('feature:',feature[:,0:4])
    print('label:',label)

最后使用batch()创建批次并将其用作模型训练的输入.

Finally create batches using batch() and use as inputs to model training.

building_dataset = building_dataset.batch(32)

这篇关于将tf.data.experimental.make_csv_dataset用于时间序列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆