将多个Keras TimeseriesGenerator对象合并或追加到一个对象中 [英] Merge or append multiple Keras TimeseriesGenerator objects into one

查看：259 发布时间：2020/4/25 10:50:51 python tensorflow keras lstm

本文介绍了将多个Keras TimeseriesGenerator对象合并或追加到一个对象中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试制作LSTM模型.数据来自包含多个股票价值的csv文件.

I'm trying to make a LSTM model. The data is coming from a csv file that contains values for multiple stocks.

我无法使用文件中显示的所有行来创建序列，因为每个序列仅在其自己的股票上下文中相关，因此我需要为每个股票选择行并基于那个.

I can't use all the rows as they appear in the file to make sequences because each sequence is only relevant in the context of its own stock, so I need to select the rows for each stock and make the sequences based on that.

我有这样的东西:

for stock in stocks:

    stock_df = df.loc[(df['symbol'] == stock)].copy()
    target = stock_df.pop('price')

    x = np.array(stock_df.values)
    y = np.array(target.values)

    sequence = TimeseriesGenerator(x, y, length = 4, sampling_rate = 1, batch_size = 1)

这很好，但是我想将每个序列合并成一个更大的序列，用于训练，并包含所有股票的数据.

That works fine, but then I want to merge each of those sequences into a bigger one that I will use for training and that contains the data for all the stocks.

无法使用附加或合并，因为该函数返回的是生成器对象，而不是numpy数组.

It is not possible to use append or merge because the function return a generator object, not a numpy array.

新答案:

所以我最终要做的是手动进行所有预处理，并为包含预处理序列的每只股票保存一个.npy文件，然后使用手动创建的生成器进行如下批量处理:

New answer:

So what I've ended up doing is to do all the preprocessing manually and save an .npy file for each stock containing the preprocessed sequences, then using a manually created generator I make batches like this:

class seq_generator():

  def __init__(self, list_of_filepaths):
    self.usedDict = dict()
    for path in list_of_filepaths:
      self.usedDict[path] = []

  def generate(self):
    while True: 
      path = np.random.choice(list(self.usedDict.keys()))
      stock_array = np.load(path) 
      random_sequence = np.random.randint(stock_array.shape[0])
      if random_sequence not in self.usedDict[path]:
        self.usedDict[path].append(random_sequence)
        yield stock_array[random_sequence, :, :]

train_generator = seq_generator(list_of_filepaths)

train_dataset = tf.data.Dataset.from_generator(seq_generator.generate(),
                                               output_types=(tf.float32, tf.float32), 
                                               output_shapes=(n_timesteps, n_features)) 

train_dataset = train_dataset.batch(batch_size)

list_of_filepaths只是经过预处理的.npy数据的路径列表.

Where list_of_filepaths is simply a list of paths to preprocessed .npy data.

这将:

加载随机股票的预处理.npy数据
随机选择一个序列
检查序列的索引是否已在usedDict
如果不是:
- 将该序列的索引附加到usedDict，以保持跟踪，以免两次向模型输入相同的数据
- 产生序列
- Load a random stock's preprocessed .npy data
- Pick a sequence at random
- Check if the index of the sequence has already been used in usedDict
- If not:
  - Append the index of that sequence to usedDict to keep track as to not feed the same data twice to the model
  - Yield the sequence
  这意味着生成器将在每次调用"时从随机股票中馈入单个唯一序列，这使我能够使用Tensorflows
  
  This means that the generator will feed a single unique sequence from a random stock at each "call", enabling me to use the .from_generator() and .batch() methods from Tensorflows Dataset type.
  
  我认为@TF_Support的答案略有遗漏.如果我理解您的问题，那并不是说您想训练一个模型专家.库存，您要在整个数据集中训练一个模型.
  
  I think the answer from @TF_Support is slightly missing the point. If I understand your question It's not as if you want to train one model pr. stock, you want one model trained on the entire dataset.
  
  如果您有足够的内存，则可以手动创建序列并将整个数据集保存在内存中.我面临的问题是类似的，我只是无法将所有内容都保存在内存中:创建具有多个输入的TimeseriesGenerator .
  
  If you have enough memory you could manually create the sequences and hold the entire dataset in memory. The issue I'm facing is similar, I simply can't hold everything in memory: Creating a TimeseriesGenerator with multiple inputs.
  
  相反，我正在探索分别预处理每个库存的所有数据，另存为.npy文件，然后使用生成器加载这些.npy文件的随机样本以将数据批处理到模型的可能性，我不是完全可以确定如何解决这个问题.
  
  Instead I'm exploring the possibility of preprocessing all data for each stock seperately, saving as .npy files and then using a generator to load a random sample of those .npy files to batch data to the model, I'm not entirely sure how to approach this yet though.
  
  这篇关于将多个Keras TimeseriesGenerator对象合并或追加到一个对象中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将多个Keras TimeseriesGenerator对象合并或追加到一个对象中 [英] Merge or append multiple Keras TimeseriesGenerator objects into one

问题描述

推荐答案

新答案:

New answer:

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将多个Keras TimeseriesGenerator对象合并或追加到一个对象中 [英] Merge or append multiple Keras TimeseriesGenerator objects into one

问题描述

推荐答案

新答案:

New answer:

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭