批量LVS时间LSTM [英] Batch-major vs time-major LSTM

查看:84
本文介绍了批量LVS时间LSTM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当输入为批处理而非时间为主时,RNN是否学习不同的依赖关系模式?

Do RNNs learn different dependency patterns when the input is batch-major as opposed to time-major?

推荐答案

(对不起,我最初的论点是为什么它合理,但我意识到它没有 >,所以这有点过时.)

( sorry my initial argument was why it makes sense but I realized that it doesn't so this is a little OT.)

我还没有找到背后的TF-group推理,但是没有在运算上是有意义的,因为ops是用C ++编写的.

I haven't found the TF-groups reasoning behind this but it does does not make computational sense as ops are written in C++.

直观地讲,我们希望在同一时间步上将同一序列中的不同特征融合(相乘/相加等).批处理/序列可以并行执行,因此不能同时执行不同的时间步长,因此>批处理/序列>时间步长可以实现.

Intuitively, we want to mash up (multiply/add etc) different features from the same sequence on the same timestep. Different timesteps can’t be done in parallell while batch/sequences can so feature>batch/sequence>timestep.

通过默认的Numpy和C ++使用行优先的(C- ),因此

[[ 0.  1.  2.]
 [ 3.  4.  5.]
 [ 6.  7.  8.]]

[0,1,2,3,4,5,6,7,8]一样放置在内存中.这意味着如果我们有

Is laying like [0,1,2,3,4,5,6,7,8] in memory. This means that if we have

x = np.zeros([time,batch,feature])

(张量流中的time_major=True)

在以行为主的内存中,我们得到的布局类似于x[0,0,0],x[0,0,1],x[0,0,2],…,x[0,1,0],...,例如.来自相同序列和时间步长(w*x[t,b,:])的权重和向量的点积是最连续的操作,其次是下一个序列w*x[t,b+1,:]等.这是我们在训练中想要的.

In Row-major memory we get a layout like x[0,0,0],x[0,0,1],x[0,0,2],…,x[0,1,0],... so ex. dot product of weights and vectors from the same sequence and timestep (w*x[t,b,:]) is the most contiguous operation followed by next sequence w*x[t,b+1,:] etc. This is what we want during training.

默认使用time_major=False,我们具有[批处理,时间,特征],因此来自相同序列但不同时间步长的ex特征更连续,即w*x[batch,t,:]后跟w*x[batch,t+1,:]等.这对于预测一个特征可能更快如果推出RNN,则一次排序,但这只是推测.

With time_major=False which is default we have [batch,time,feature] so ex features from same sequence but different timesteps are more contiguous i.e w*x[batch,t,:] followed by w*x[batch,t+1,:] etc. This might be faster for prediction of one sequence at a time if RNN is rolled out but this is speculation.

如果您出于与我相同的原因来提出这个问题,我学会了谨慎使用稍微不直观的Numpy-indexing,它应该是Python语言的,不一定是Row Major.看这个.如预期的那样:

If you came to this question for the same reason I did, I learned to be careful with the slightly unintuitive Numpy-indexing which is meant to be pythonic, not necessarily Row Major. Look at this. As expected:

x = np.zeros([3,3])
x[0:9].flat = np.arange(10)
print x
>   [[ 0.  1.  2.]
>    [ 3.  4.  5.]
>    [ 6.  7.  8.]]

我们也希望x[1] == x[0,1]但是

print x[1]
> [ 3.  4.  5.]

print x[np.arange(10)<=4]
> IndexError: index 3 is out of bounds for axis 0 with size 3

这篇关于批量LVS时间LSTM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆