批量LVS时间LSTM [英] Batch-major vs time-major LSTM
问题描述
当输入为批处理而非时间为主时,RNN是否学习不同的依赖关系模式?
Do RNNs learn different dependency patterns when the input is batch-major as opposed to time-major?
推荐答案
(对不起,我最初的论点是为什么它合理,但我意识到它没有 >,所以这有点过时.)
( sorry my initial argument was why it makes sense but I realized that it doesn't so this is a little OT.)
我还没有找到背后的TF-group推理,但是
I haven't found the TF-groups reasoning behind this but it does does not make computational sense as ops are written in C++.
直观地讲,我们希望在同一时间步上将同一序列中的不同特征融合(相乘/相加等).批处理/序列可以并行执行,因此不能同时执行不同的时间步长,因此>批处理/序列>时间步长可以实现.
Intuitively, we want to mash up (multiply/add etc) different features from the same sequence on the same timestep. Different timesteps can’t be done in parallell while batch/sequences can so feature>batch/sequence>timestep.
通过默认的Numpy和C ++使用行优先的(C- ),因此
[[ 0. 1. 2.]
[ 3. 4. 5.]
[ 6. 7. 8.]]
像[0,1,2,3,4,5,6,7,8]
一样放置在内存中.这意味着如果我们有
Is laying like [0,1,2,3,4,5,6,7,8]
in memory. This means that if we have
x = np.zeros([time,batch,feature])
(张量流中的time_major=True
)
在以行为主的内存中,我们得到的布局类似于x[0,0,0],x[0,0,1],x[0,0,2],…,x[0,1,0],...
,例如.来自相同序列和时间步长(w*x[t,b,:]
)的权重和向量的点积是最连续的操作,其次是下一个序列w*x[t,b+1,:]
等.这是我们在训练中想要的.
In Row-major memory we get a layout like x[0,0,0],x[0,0,1],x[0,0,2],…,x[0,1,0],...
so ex. dot product of weights and vectors from the same sequence and timestep (w*x[t,b,:]
) is the most contiguous operation followed by next sequence w*x[t,b+1,:]
etc. This is what we want during training.
默认使用time_major=False
,我们具有[批处理,时间,特征],因此来自相同序列但不同时间步长的ex特征更连续,即w*x[batch,t,:]
后跟w*x[batch,t+1,:]
等.这对于预测一个特征可能更快如果推出RNN,则一次排序,但这只是推测.
With time_major=False
which is default we have [batch,time,feature] so ex features from same sequence but different timesteps are more contiguous i.e w*x[batch,t,:]
followed by w*x[batch,t+1,:]
etc. This might be faster for prediction of one sequence at a time if RNN is rolled out but this is speculation.
如果您出于与我相同的原因来提出这个问题,我学会了谨慎使用稍微不直观的Numpy-indexing,它应该是Python语言的,不一定是Row Major.看这个.如预期的那样:
If you came to this question for the same reason I did, I learned to be careful with the slightly unintuitive Numpy-indexing which is meant to be pythonic, not necessarily Row Major. Look at this. As expected:
x = np.zeros([3,3])
x[0:9].flat = np.arange(10)
print x
> [[ 0. 1. 2.]
> [ 3. 4. 5.]
> [ 6. 7. 8.]]
我们也希望x[1] == x[0,1]
但是
print x[1]
> [ 3. 4. 5.]
print x[np.arange(10)<=4]
> IndexError: index 3 is out of bounds for axis 0 with size 3
这篇关于批量LVS时间LSTM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!