Tensorflow 数据集 API - 将窗口应用于多个序列 [英] Tensorflow dataset API - Apply windows to multiple sequences

查看:29
本文介绍了Tensorflow 数据集 API - 将窗口应用于多个序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想设置一个处理顺序数据的数据管道.序列中的每个数据点都有一个固定的维度,例如64x64.我有多个长度可变的序列.所以我的数据集可以简化为:

I want to setup a data pipeline working with sequential data. Each data point in a sequence has a fixed dimenstionality, e.g. 64x64. I have multiple sequences with variable length. So my dataset can be simplified to:

seq1 = np.arange(5)[:, None, None]
seq2 = np.arange(8)[:, None, None]
seq3 = np.arange(7)[:, None, None]
sequences = [seq1, seq2, seq3]

现在,我想对序列中的一系列时间帧进行操作,从而产生 3 维数据立方体 [N_frames, data_dim1, data_dim2].

Now, I want to operate on a series of time frames within the sequences, resulting in 3-dimensional data cubes [N_frames, data_dim1, data_dim2].

对于单个序列,我在 TF 的 Dataset API 中找到了 window,它允许我使用窗口来构建数据立方体:

For a single sequence, I found window in TF's Dataset API, which allows me to use windowing to build the data cubes:

window = 3
shift = 1
ds = tf.data.Dataset.from_tensor_slices(seq1)
ds = ds.window(size=window , shift=shift, drop_remainder=True).flat_map(lambda x: x.batch(window))
for d in ds:
    print(d)

结果

tf.Tensor(
[[[0]]

 [[1]]

 [[2]]], shape=(3, 1, 1), dtype=int32)
tf.Tensor(
[[[1]]

 [[2]]

 [[3]]], shape=(3, 1, 1), dtype=int32)
tf.Tensor(
[[[2]]

 [[3]]

 [[4]]], shape=(3, 1, 1), dtype=int32)

现在,我很难将这个操作转移到我的全套序列中.如何从我的序列集中获取所有数据立方体?

Now, I struggle with transferring this operation to my full set of sequences. How can I get all the data cubes from my set of sequences?

推荐答案

我自己找到了答案.我分别在每个序列上使用 window 函数.我将此过程包装在一个小函数中,然后通过 flat_map 将其应用于我的序列集:

I found an answer by myself. I use the window function on each sequence separately. I wrap this procedure in a small function, which is then applied to my set of sequences via flat_map:

sequences = [np.arange(5)[:, None, None], np.arange(20, 24)[:, None, None]]

def get_data_cubes(sequence, size, shift=None, stride=1, drop_remainder=False):
    ds = tf.data.Dataset.from_tensor_slices(sequence)
    ds = ds.window(size=size, shift=shift, stride=stride, drop_remainder=drop_remainder)
    ds = ds.flat_map(lambda x: x.batch(size))
    return ds

window = 3
shift = 1
dataset = tf.data.Dataset.from_generator(lambda: sequences, tf.as_dtype(sequences[0].dtype), tf.TensorShape([None, 1, 1]))
dataset = dataset.flat_map(lambda x: get_data_cubes(x, window, shift=shift, drop_remainder=True))

for d in dataset:
    print(d)

结果

tf.Tensor(
[[[0]]

 [[1]]

 [[2]]], shape=(3, 1, 1), dtype=int32)
tf.Tensor(
[[[1]]

 [[2]]

 [[3]]], shape=(3, 1, 1), dtype=int32)
tf.Tensor(
[[[2]]

 [[3]]

 [[4]]], shape=(3, 1, 1), dtype=int32)
tf.Tensor(
[[[20]]

 [[21]]

 [[22]]], shape=(3, 1, 1), dtype=int32)
tf.Tensor(
[[[21]]

 [[22]]

 [[23]]], shape=(3, 1, 1), dtype=int32)

这正是我搜索的结果.顺便说一句:这个数据集可以被当作一个标准的 TF 数据集,带有 shuffle、batching 等.

which is exactly the result I searched for. BTW: This dataset can be treated like a standard TF dataset with shuffling, batching, etc.

这篇关于Tensorflow 数据集 API - 将窗口应用于多个序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆