创建具有多个输入的TimeseriesGenerator [英] Creating a TimeseriesGenerator with multiple inputs

查看：380 发布时间：2020/4/25 10:59:21 python tensorflow keras time-series generator

本文介绍了创建具有多个输入的TimeseriesGenerator的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

由于存储限制，在转换为模型的序列后，我无法将所有内容都保存在内存中.我正在尝试针对来自约4000只股票的每日基本面和价格数据训练LSTM模型.

I'm trying to train an LSTM model on daily fundamental and price data from ~4000 stocks, due to memory limits I cannot hold everything in memory after converting to sequences for the model.

这导致我改用生成器，例如 TimeseriesGenerator 来自Keras/Tensorflow.问题是，如果我尝试对所有堆积的数据使用生成器，则会创建混合股票序列，请参见下面的示例，其中包含5个序列，此处序列3 将包含对股票的最后4个观察值" stock 1 "和" stock 2 "

This leads me to using a generator instead like the TimeseriesGenerator from Keras / Tensorflow. Problem is that if I try using the generator on all of my data stacked it would create sequences of mixed stocks, see the example below with a sequence of 5, here Sequence 3 would include the last 4 observations of "stock 1" and the first observation of "stock 2"

相反，我想要的是这样的:

Instead what I would want is similar to this:

有点类似的问题:合并或追加多个Keras TimeseriesGenerator对象

我探索了将生成器组合在一起的选项，因此建议如下:我如何组合两个keras生成器函数，但是在大约4000个生成器的情况下，这不是个主意.

I explored the option of combining the generators like this SO suggests: How do I combine two keras generator functions, however this is not idea in the case of ~4000 generators.

我希望我的问题有意义.

I hope my question makes sense.

推荐答案

所以我最终要做的是手动进行所有预处理，并为包含预处理序列的每只股票保存一个.npy文件，然后手动使用创建了生成器，我进行了如下批量处理:

So what I've ended up doing is to do all the preprocessing manually and save an .npy file for each stock containing the preprocessed sequences, then using a manually created generator I make batches like this:

class seq_generator():

  def __init__(self, list_of_filepaths):
    self.usedDict = dict()
    for path in list_of_filepaths:
      self.usedDict[path] = []

  def generate(self):
    while True: 
      path = np.random.choice(list(self.usedDict.keys()))
      stock_array = np.load(path) 
      random_sequence = np.random.randint(stock_array.shape[0])
      if random_sequence not in self.usedDict[path]:
        self.usedDict[path].append(random_sequence)
        yield stock_array[random_sequence, :, :]

train_generator = seq_generator(list_of_filepaths)

train_dataset = tf.data.Dataset.from_generator(seq_generator.generate(),
                                               output_types=(tf.float32, tf.float32), 
                                               output_shapes=(n_timesteps, n_features)) 

train_dataset = train_dataset.batch(batch_size)

list_of_filepaths只是经过预处理的.npy数据的路径列表.

Where list_of_filepaths is simply a list of paths to preprocessed .npy data.

这将:

加载随机股票的预处理.npy数据
随机选择一个序列
检查序列的索引是否已在usedDict
如果不是:
- 将该序列的索引附加到usedDict，以保持跟踪，以免两次向模型输入相同的数据
- 产生序列
- Load a random stock's preprocessed .npy data
- Pick a sequence at random
- Check if the index of the sequence has already been used in usedDict
- If not:
  - Append the index of that sequence to usedDict to keep track as to not feed the same data twice to the model
  - Yield the sequence
  这意味着生成器将在每次调用"时从随机股票中馈入单个唯一序列，这使我能够使用Tensorflows
  
  This means that the generator will feed a single unique sequence from a random stock at each "call", enabling me to use the .from_generator() and .batch() methods from Tensorflows Dataset type.
  
  这篇关于创建具有多个输入的TimeseriesGenerator的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

创建具有多个输入的TimeseriesGenerator [英] Creating a TimeseriesGenerator with multiple inputs

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

创建具有多个输入的TimeseriesGenerator [英] Creating a TimeseriesGenerator with multiple inputs

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭