输入维度不匹配二进制交叉熵Lasagne和Theano [英] Input dimension mismatch binary crossentropy Lasagne and Theano

查看:200
本文介绍了输入维度不匹配二进制交叉熵Lasagne和Theano的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我阅读了网上的所有帖子,解决了人们忘记将目标向量更改为矩阵的问题,由于更改后仍然存在问题,因此我决定在这里提出问题.下面提到了解决方法,但是出现了新问题,我非常感谢您的建议!

I read all posts in the net adressing the issue where people forgot to change the target vector to a matrix, and as a problem remains after this change, I decided to ask my question here. Workarounds are mentioned below, but new problems show and I am thankful for suggestions!

使用卷积网络设置和具有S型激活函数的二进制交叉熵,我遇到了尺寸不匹配的问题,但是在训练数据期间却没有,只有在验证/测试数据评估期间才出现.出于某种奇怪的原因,我的验证集向量中的向量的维数发生了切换,我不知道为什么.如上所述,培训效果很好.下面的代码如下,非常感谢您的帮助(并很抱歉劫持了该线程,但是我认为没有理由创建一个新线程),其中大部分是从lasagne教程示例中复制的.

Using a convolution network setup and binary crossentropy with sigmoid activation function, I get a dimension mismatch problem, but not during the training data, only during validation / test data evaluation. For some strange reason, of of my validation set vectors get his dimension switched and I have no idea, why. Training, as mentioned above, works fine. Code follows below, thanks a lot for help (and sorry for hijacking the thread, but I saw no reason for creating a new one), most of it copied from the lasagne tutorial example.

解决方法和新问题:

  1. 在valAcc定义中删除"axis = 1"会有所帮助,但是验证准确性始终为零,并且无论我有多少节点,层,过滤器等,测试分类总是返回相同的结果.即使更改训练集的大小(我每个班级都有48个64x64灰度图像的大约350个样本)也不会更改此设置.似乎有些不对劲

网络创建:

def build_cnn(imgSet, input_var=None):
# As a third model, we'll create a CNN of two convolution + pooling stages
# and a fully-connected hidden layer in front of the output layer.

# Input layer using shape information from training
network = lasagne.layers.InputLayer(shape=(None, \
    imgSet.shape[1], imgSet.shape[2], imgSet.shape[3]), input_var=input_var)
# This time we do not apply input dropout, as it tends to work less well
# for convolutional layers.

# Convolutional layer with 32 kernels of size 5x5. Strided and padded
# convolutions are supported as well; see the docstring.
network = lasagne.layers.Conv2DLayer(
        network, num_filters=32, filter_size=(5, 5),
        nonlinearity=lasagne.nonlinearities.rectify,
        W=lasagne.init.GlorotUniform())

# Max-pooling layer of factor 2 in both dimensions:
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))

# Another convolution with 16 5x5 kernels, and another 2x2 pooling:
network = lasagne.layers.Conv2DLayer(
        network, num_filters=16, filter_size=(5, 5),
        nonlinearity=lasagne.nonlinearities.rectify)

network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))

# A fully-connected layer of 64 units with 25% dropout on its inputs:
network = lasagne.layers.DenseLayer(
        lasagne.layers.dropout(network, p=.25),
        num_units=64,
        nonlinearity=lasagne.nonlinearities.rectify)

# And, finally, the 2-unit output layer with 50% dropout on its inputs:
network = lasagne.layers.DenseLayer(
        lasagne.layers.dropout(network, p=.5),
        num_units=1,
        nonlinearity=lasagne.nonlinearities.sigmoid)

return network

所有集合的目标矩阵都是这样创建的(以训练目标向量为例)

Target matrices for all sets are created like this (training target vector as an example)

 targetsTrain = np.vstack( (targetsTrain, [[targetClass], ]*numTr) );

...以及theano变量

...and the theano variables as such

inputVar = T.tensor4('inputs')
targetVar = T.imatrix('targets')
network = build_cnn(trainset, inputVar)
predictions = lasagne.layers.get_output(network)
loss = lasagne.objectives.binary_crossentropy(predictions, targetVar)
loss = loss.mean()
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.9)
valPrediction = lasagne.layers.get_output(network, deterministic=True)
valLoss = lasagne.objectives.binary_crossentropy(valPrediction, targetVar)
valLoss = valLoss.mean()
valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar), dtype=theano.config.floatX)
train_fn = function([inputVar, targetVar], loss, updates=updates,  allow_input_downcast=True)
val_fn = function([inputVar, targetVar], [valLoss, valAcc])

最后,这里是两个循环,即训练和测试.第一个很好,第二个抛出错误,摘录如下

Finally, here the two loops, training and test. The first is fine, the second throws the error, excerpts below

# -- Neural network training itself -- #
numIts = 100
for itNr in range(0, numIts):
train_err = 0
train_batches = 0
for batch in iterate_minibatches(trainset.astype('float32'), targetsTrain.astype('int8'), len(trainset)//4, shuffle=True):
    inputs, targets = batch
    print (inputs.shape)
    print(targets.shape)        
    train_err += train_fn(inputs, targets)
    train_batches += 1

# And a full pass over the validation data:
val_err = 0
val_acc = 0
val_batches = 0

for batch in iterate_minibatches(valset.astype('float32'), targetsVal.astype('int8'), len(valset)//3, shuffle=False):
    [inputs, targets] = batch
    [err, acc] = val_fn(inputs, targets)
    val_err += err
    val_acc += acc
    val_batches += 1

Errrr(节选)

Exception "unhandled ValueError"
Input dimension mis-match. (input[0].shape[1] = 52, input[1].shape[1] = 1)
Apply node that caused the error: Elemwise{eq,no_inplace}(DimShuffle{x,0}.0, targets)
Toposort index: 36
Inputs types: [TensorType(int64, row), TensorType(int32, matrix)]
Inputs shapes: [(1, 52), (52, 1)]
Inputs strides: [(416, 8), (4, 4)]
Inputs values: ['not shown', 'not shown']

再次感谢您的帮助!

推荐答案

所以看来错误在于验证准确性的评估上. 当您在计算中删除"axis = 1"时,argmax继续执行所有操作,仅返回一个数字. 然后,广播进入,这就是为什么您会在整个电视机上看到相同值的原因.

so it seems the error is in the evaluation of the validation accuracy. When you remove the "axis=1" in your calculation, the argmax goes on everything, returning only a number. Then, broadcasting steps in and this is why you would see the same value for the whole set.

但是由于您发布的错误,"T.eq"操作引发了错误,因为它必须将52 x 1与1 x 52向量(theano/numpy矩阵)进行比较. 因此,我建议您尝试将行替换为:

But from the error you have posted, the "T.eq" op throws the error because it has to compare a 52 x 1 with a 1 x 52 vector (matrix for theano/numpy). So, I suggest you try to replace the line with:

    valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))

我希望这可以解决该错误,但是我自己尚未对其进行测试.

I hope this should fix the error, but I haven't tested it myself.

错误在于被调用的argmax op. 通常,argmax在那里确定哪个输出单元被激活得最多. 但是,在您的设置中,您只有一个输出神经元,这意味着所有输出神经元上的argmax将始终返回0(对于第一个arg).

The error lies in the argmax op that is called. Normally, the argmax is there to determine which of the output units is activated the most. However, in your setting you only have one output neuron which means that the argmax over all output neurons will always return 0 (for first arg).

这就是为什么您给网络留下的印象总是为0的原因.

This is why you have the impression your network gives you always 0 as output.

通过替换:

    valAcc = T.mean(T.eq(T.argmax(valPrediction, axis=1), targetVar.T))

具有:

    binaryPrediction = valPrediction > .5
    valAcc = T.mean(T.eq(binaryPrediction, targetVar.T)

您应该得到想要的结果.

you should get the desired result.

我不确定,是否仍然需要转置.

I'm just not sure, if the transpose is still necessary or not.

这篇关于输入维度不匹配二进制交叉熵Lasagne和Theano的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆