如何处理极长的LSTM序列长度? [英] How to handle extremely long LSTM sequence length?

查看:1110
本文介绍了如何处理极长的LSTM序列长度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据以非常高的速率(每秒数百次)采样.对于任何给定的实例,这平均导致序列长度很大(〜90,000个样本).整个序列只有一个标签.我正在尝试使用LSTM神经网络将新序列分类为这些标签之一(多类分类).

I have some data that is sampled at at a very high rate (on the order of hundreds of times per second). This results in a sequence length that is huge (~90,000 samples) on average for any given instance. This entire sequence has a single label. I am trying to use an LSTM neural network to classify new sequences as one of these labels (multiclass classification).

但是,使用具有如此长序列长度的LSTM会导致网络很大.

However, using an LSTM with a such a large sequence length results in a network that is quite large.

有什么方法可以有效地对这些序列进行聚类",以便我可以减少神经网络的序列长度,但仍保持整个实例中捕获的信息?

What are some methods to effectively 'chunk' these sequences so that I could reduce the sequence length of the neural network, yet still maintain the information captured in the entire instance?

推荐答案

当序列很长时,RNN可能会遇到消失梯度和爆炸梯度的问题.

When you have very long sequences RNNs can face the problem of vanishing gradients and exploding gradients.

有方法.您需要了解的第一件事是为什么我们需要尝试上述方法?这是因为由于上述问题,随时间的反向传播会变得非常困难.

There are methods. The first thing you need to understand is why we need to try above methods? It's because back propagation through time can get real hard due to above mentioned problems.

是的,LSTM的引入已在很大程度上减少了这种情况,但是仍然很长,您仍然会遇到此类问题.

因此,一种方法是剪切渐变.这意味着您为渐变设置了上限.请参阅此 stackoverflow问题

So one way is clipping the gradients. That means you set an upper bound to gradients. Refer to this stackoverflow question

那你问的这个问题

有哪些方法可以有效地整理"这些序列?

What are some methods to effectively 'chunk' these sequences?

一种方法是截断时间的反向传播.有许多方法可以实现此截短的BPTT .简单的想法是

One way is truncated back propagation through time. There are number of ways to implement this truncated BPTT. Simple idea is

  1. 仅针对给定时间步长数计算梯度 这意味着,如果您的序列是200个时间步长,而您仅给出10个时间步长,则它将仅计算10个时间步长的梯度,然后将在该10个时间步长中存储的内存值传递给下一个序列(作为初始单元状态).此方法是 tensorflow用于计算截断的BPTT的方法.
  1. Calculate the gradients only for number of given time steps That means if your sequence is 200 time steps and you only give 10 time steps it will only calculate gradient for 10 time step and then pass the stored memory value in that 10 time step to next sequence(as the initial cell state) . This method is what tensorflow using to calculate truncated BPTT.

2.采取完整的序列,并仅从选定的时间块开始反向传播某些给定时间步长的梯度.

2.Take the full sequence and only back propagate gradients for some given time steps from selected time block. It's a continuous way

这是我发现的最好的文章,它解释了这些截断的BPTT方法.非常简单.请参阅此截断的反向传播样式

Here is the best article I found which explains these trunacated BPTT methods. Very easy. Refer to this Styles of Truncated Backpropagation

这篇关于如何处理极长的LSTM序列长度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆