PyTorch:时间序列任务的数据加载器 [英] PyTorch: Dataloader for time series task

查看:78
本文介绍了PyTorch:时间序列任务的数据加载器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Pandas 数据框,其中 n 行和 k 列已加载到内存中.我想获得用于预测任务的批次,其中批次的第一个训练示例应具有形状 (q, k)q 指的是来自原始数据帧(例如 0:128).下一个例子应该是 (128:256, k) 等等.因此,最终一批应该具有形状 (32, q, k) ,其中 32 对应于批次大小.

I have a Pandas dataframe with n rows and k columns loaded into memory. I would like to get batches for a forecasting task where the first training example of a batch should have shape (q, k) with q referring to the number of rows from the original dataframe (e.g. 0:128). The next example should be (128:256, k) and so on. So, ultimately, one batch should have the shape (32, q, k) with 32 corresponding to the batch size.

由于来自 data_utilsTensorDataset 在这里不起作用,我想知道最好的方法是什么.我尝试使用 np.array_split()q 值的可能拆分数量作为第一维,以便编写自定义 DataLoader,但不能保证重新整形工作,因为并非所有数组都具有相同的形状.

Since TensorDataset from data_utils does not work here, I am wondering what the best way would be. I tried to use np.array_split() to get as first dimension the number of possible splits of q values in order to write a custom DataLoader but then reshaping is not guaranteed to work since not all arrays have the same shape.

这是一个最小的例子,可以让它更清楚.在这种情况下,批大小为 3,q 为 2:

Here is a minimal example to make it more clear. In this case, batch size is 3 and q is 2:

import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.arange(0,30).reshape(10,3),columns=['A','B','C'])

数据集:

    A   B   C
0   0   1   2
1   3   4   5
2   6   7   8
3   9   10  11
4   12  13  14
5   15  16  17
6   18  19  20
7   21  22  23
8   24  25  26
9   27  28  29

在这种情况下,第一批应该具有形状 (3,2,3) 并且看起来像:

The first batch in this case should have the shape (3,2,3) and look like:

array([[[ 0.,  1.,  2.],
        [ 3.,  4.,  5.]],

       [[ 3.,  4.,  5.],
        [ 6.,  7.,  8.]],

       [[ 6.,  7.,  8.],
        [ 9., 10., 11.]]])

推荐答案

您可以编写 TensorDataset 的模拟.为此,您需要从 Dataset 类继承.

You can write your analog of the TensorDataset. To do this you need to inherit from the Dataset class.

from torch.utils.data import Dataset, DataLoader

class MyDataset(Dataset):
    def __init__(self, data_frame, q):
        self.data = data_frame.values
        self.q = q

    def __len__(self):
        return self.data.shape[0] // self.q

    def __getitem__(self, index):
        return self.data[index * self.q: (index+1) * self.q]

这篇关于PyTorch:时间序列任务的数据加载器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆