将多个Keras TimeseriesGenerator对象合并或追加到一个对象中 [英] Merge or append multiple Keras TimeseriesGenerator objects into one

查看:259
本文介绍了将多个Keras TimeseriesGenerator对象合并或追加到一个对象中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试制作LSTM模型.数据来自包含多个股票价值的csv文件.

I'm trying to make a LSTM model. The data is coming from a csv file that contains values for multiple stocks.

我无法使用文件中显示的所有行来创建序列,因为每个序列仅在其自己的股票上下文中相关,因此我需要为每个股票选择行并基于那个.

I can't use all the rows as they appear in the file to make sequences because each sequence is only relevant in the context of its own stock, so I need to select the rows for each stock and make the sequences based on that.

我有这样的东西:

for stock in stocks:

    stock_df = df.loc[(df['symbol'] == stock)].copy()
    target = stock_df.pop('price')

    x = np.array(stock_df.values)
    y = np.array(target.values)

    sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)

这很好,但是我想将每个序列合并成一个更大的序列,用于训练,并包含所有股票的数据.

That works fine, but then I want to merge each of those sequences into a bigger one that I will use for training and that contains the data for all the stocks.

无法使用附加或合并,因为该函数返回的是生成器对象,而不是numpy数组.

It is not possible to use append or merge because the function return a generator object, not a numpy array.

推荐答案

新答案:


所以我最终要做的是手动进行所有预处理,并为包含预处理序列的每只股票保存一个.npy文件,然后使用手动创建的生成器进行如下批量处理:

New answer:


So what I've ended up doing is to do all the preprocessing manually and save an .npy file for each stock containing the preprocessed sequences, then using a manually created generator I make batches like this:

class seq_generator():

  def __init__(self, list_of_filepaths):
    self.usedDict = dict()
    for path in list_of_filepaths:
      self.usedDict[path] = []

  def generate(self):
    while True: 
      path = np.random.choice(list(self.usedDict.keys()))
      stock_array = np.load(path) 
      random_sequence = np.random.randint(stock_array.shape[0])
      if random_sequence not in self.usedDict[path]:
        self.usedDict[path].append(random_sequence)
        yield stock_array[random_sequence, :, :]

train_generator = seq_generator(list_of_filepaths)

train_dataset = tf.data.Dataset.from_generator(seq_generator.generate(),
                                               output_types=(tf.float32, tf.float32), 
                                               output_shapes=(n_timesteps, n_features)) 

train_dataset = train_dataset.batch(batch_size)

list_of_filepaths只是经过预处理的.npy数据的路径列表.

Where list_of_filepaths is simply a list of paths to preprocessed .npy data.

这将:

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆