Torch/Lua,如何在暹罗神经网络中正确实施小批量训练? [英] Torch / Lua, how to correctly implement minibatch training in a siamese neural network?
问题描述
正如我之前的一些问题所提到的,我仍在努力在Torch中实现暹罗神经网络. 我终于有了一个很好的工作实现,但是现在我想添加一个小批量培训.也就是说,我想用一组训练元素来训练暹罗神经网络,而不是只使用一个.
I'm still working on my implementation of a siamese neural network in Torch, as mentioned in some of my previous questions. I finally got a good working implementation of it, but now I'd like to add a mini-batch training. That is, I would like to train the siamese neural network with a set of training elements, instead of using just one.
不幸的是,我的2个迷你批次的实现无法正常工作.错误的反向传播存在一个我无法解决的问题. 这是主要架构:
Unfortunately, my implementation for 2 minibatches does not work. There's a problem in the back-propagation of the error, that I cannot solve. Here's the main architecture:
th> perceptron_general
nn.Sequential {
[input -> (1) -> output]
(1): nn.ParallelTable {
input
|`-> (1): nn.Sequential {
| [input -> (1) -> (2) -> output]
| (1): nn.ParallelTable {
| input
| |`-> (1): nn.Sequential {
| | [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
| | (1): nn.Linear(6 -> 3)
| | (2): nn.Tanh
| | (3): nn.Dropout
| | (4): nn.Linear(3 -> 2)
| | (5): nn.Tanh
| | }
| |`-> (2): nn.Sequential {
| | [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
| | (1): nn.Linear(6 -> 3)
| | (2): nn.Tanh
| | (3): nn.Dropout
| | (4): nn.Linear(3 -> 2)
| | (5): nn.Tanh
| | }
| ... -> output
| }
| (2): nn.CosineDistance
| }
|`-> (2): nn.Sequential {
| [input -> (1) -> (2) -> output]
| (1): nn.ParallelTable {
| input
| |`-> (1): nn.Sequential {
| | [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
| | (1): nn.Linear(6 -> 3)
| | (2): nn.Tanh
| | (3): nn.Dropout
| | (4): nn.Linear(3 -> 2)
| | (5): nn.Tanh
| | }
| |`-> (2): nn.Sequential {
| | [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output]
| | (1): nn.Linear(6 -> 3)
| | (2): nn.Tanh
| | (3): nn.Dropout
| | (4): nn.Linear(3 -> 2)
| | (5): nn.Tanh
| | }
| ... -> output
| }
| (2): nn.CosineDistance
| }
... -> output
}
}
我有一个上层神经网络和一个下层神经网络.它们都插入到并行表中.然后将此并行表插入到感知器中 第二个并行表也是如此. 然后,将两个并行表感知器放到一个通用并行表中,该表将插入一个通用percepron中.
I've an upper neural network, put together with a lower neural network. They all are insereted into a parallel table. This parallel table is then inserted into a perceptron The same is made for a second parallel table. Then the two parallel-table-perceptrons are put together into a general parallel table, that is inserted in a general percepron.
我认为这种体系结构是正确的,但是我缺少了带有梯度更新功能的东西.
I think this architecture is right, but I'm missing something with the gradient_update function.
这是我的代码:
-- rounds a real number num to the number having idp values after the dot
function round(num, idp)
local mult = 10^(idp or 0)
return math.floor(num * mult + 0.5) / mult
end
idp = 4
-- change the sign of an array
function changeSignToArray(array)
newArray={}
for i=1,#array do
newArray[i]= -1 * array[i]
end
return newArray;
end
-- subtable function
function subtable(table, lower_index, upper_index)
return_table = {}
k = 1
for i=lower_index,upper_index do
return_table[k] = table[i]
k = k+1
end
return return_table;
end
-- training
function gradientUpdate(perceptron, dataset, target, learningRate)
temp_dataset = dataset
temp_target = target
temp_perceptron = perceptron
print("### new gradientUpdate() ###");
print("#dataset "..#dataset);
print("(#dataset[1][1])[1] "..(#dataset[1][1])[1]);
print("#target "..#target);
predictionValue = (perceptron:forward(dataset)[1])[1]
print('predictionValue '..predictionValue);
-- if predictionValue*target < 1 then
realTarget=changeSignToArray(target)
gradientWrtOutput = torch.Tensor(realTarget)
temp_gradient = gradientWrtOutput
perceptron:zeroGradParameters()
perceptron:backward(dataset, gradientWrtOutput)
perceptron:updateParameters(learningRate)
-- end
return perceptron;
end
require "os"
require "nn"
dropOutFlag=TRUE
input_number=6
hiddenUnits=3
output_number=2
hiddenLayers=5
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- LET'S PREPARE THE DATA -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
dim = 483
trainDataset = {};
targetDataset = {}
for i=1,dim do
trainDataset[i]={torch.rand(input_number), torch.rand(input_number)}
if i%2==0 then targetDataset[i] = 1
else targetDataset[i] = -1
end
end
function targetDataset:size() return #targetDataset end
target = -1 -- the target for cosine similarity is +1 for genuine signatures, and -1 for forgeries
io.write("#trainDataset="..#trainDataset.." \n");
io.write("#trainDataset[1]="..#trainDataset[1].." \n");
io.write("#targetDataset="..#targetDataset.." \n");
-- matrix having 5 rows * 2 columns
max_iterations = 25
learnRate = 0.1
minibatchSize = 10
for m=30,1,-1 do
if (dim % m) == 0 then minibatchSize=m; break; end
end
minibatchSize = 2
print('minibatchSize='..minibatchSize);
span_number = dim/minibatchSize
print('span_number '..span_number);
minibatch_train = {torch.Tensor(span_number)}
target_train = {torch.Tensor(span_number)}
i=1
for m=1, span_number do
minibatch_train[i] = torch.Tensor(minibatchSize)
target_train[i] = torch.Tensor(minibatchSize)
lower_index = 1+minibatchSize*(m-1)
upper_index = (m-1)*minibatchSize+minibatchSize
io.write("i= "..i.." lower_index ".. lower_index)
io.write(" upper_index "..upper_index.."\n")
minibatch_train[i] = subtable(trainDataset, lower_index, upper_index)
target_train[i] = subtable(targetDataset, lower_index, upper_index)
i = i + 1
end
print('\n#minibatch_train '.. #minibatch_train);
print('#minibatch_train[1] '.. #minibatch_train[1]);
print('#target_train '.. #target_train);
print('#target_train[1] '.. #target_train[1]..'\n');
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- LET'S PREPARE THE SIAMESE NEURAL NETWORK --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- imagine we have one network we are interested in, it is called "perceptronUpper"
perceptronUpper= nn.Sequential()
perceptronUpper:add(nn.Linear(input_number, hiddenUnits))
perceptronUpper:add(nn.Tanh())
if dropOutFlag==TRUE then perceptronUpper:add(nn.Dropout()) end
-- for w=1, hiddenLayers do
-- perceptronUpper:add(nn.Linear(hiddenUnits,hiddenUnits))
-- perceptronUpper:add(nn.Tanh())
-- if dropOutFlag==TRUE then perceptronUpper:add(nn.Dropout()) end
-- end
perceptronUpper:add(nn.Linear(hiddenUnits,output_number))
perceptronUpper:add(nn.Tanh())
-- But we want to push examples towards or away from each other
-- so we make another copy of it called perceptronLower
-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage
-- that's why we create it again (so that the gradients of the pair don't wipe each other)
perceptronLower= perceptronUpper:clone('weight', 'gradWeights', 'gradBias', 'bias')
-- updates the gradient weights and gradient bias
-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) perceptron
-- ParallelTable is a container module that, in its forward() method, applies the i-th member module to the i-th input, and outputs a table of the set of outputs.
parallel_table = nn.ParallelTable()
parallel_table:add(perceptronUpper)
parallel_table:add(perceptronLower)
-- now we define our top level network that takes this parallel table and computes the cosine distance betweem
-- the pair of outputs
perceptron= nn.Sequential()
perceptron:add(parallel_table)
perceptron:add(nn.CosineDistance())
-- For the minibatch
general_parallel= nn.ParallelTable()
for mb=1,minibatchSize do
general_parallel:add(perceptron)
end
perceptron_general = nn.Sequential()
perceptron_general:add(general_parallel)
-- -- # TRAINING:
-- -- training on only 1 example for TRUE
for i = 1, max_iterations do
perceptron_general = gradientUpdate(perceptron_general, minibatch_train[1], target_train[1], learnRate)
perceptron_general = round((perceptron_general:forward(dataset)[1]),idp);
io.write("i="..i..") optimization predictionValue= "..prediction.."\n");
if(prediction==target) then io.write("\tprediction==target OUT"); break end
end
问题来自对向后函数的调用. 尺寸可能有问题...
The problem comes with the call to backwards() function. Possibly there's a problem in the dimensions...
您对如何解决此问题有任何想法吗?
Do you have any ideas on how to solve this?
推荐答案
问题来自对向后函数的调用.尺寸可能有问题...
The problem comes with the call to backwards() function. Possibly there's a problem in the dimensions...
从技术上讲,关于perceptron_general
的结构,当您向后执行第二个参数(= gradOutput
)时,应该一张由2 x 1D张量构成的表(即每个gradOutput
顶部并行表的分支),其内容类似于:
Technically speaking regarding the structure of perceptron_general
when you perform a backward the 2nd argument (= gradOutput
) should be a table made of 2 x 1D tensors (i.e. one gradOutput
per branch of your top parallel table) which gives something like:
gradientWrtOutput = {
torch.Tensor{realTarget[1]},
torch.Tensor{realTarget[2]}
}
注意:在您的主要训练循环中出现另一个错误之后.
这篇关于Torch/Lua,如何在暹罗神经网络中正确实施小批量训练?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!