如何修改此 PyTorch 卷积神经网络以接受 64 x 64 图像并正确输出预测? [英] How do I modify this PyTorch convolutional neural network to accept a 64 x 64 image and properly output predictions?

查看:21
本文介绍了如何修改此 PyTorch 卷积神经网络以接受 64 x 64 图像并正确输出预测?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从这里获取了这个卷积神经网络 (CNN).它接受 32 x 32 图像并默认为 10 个类.但是,我有 500 个类的 64 x 64 图像.当我传入 64 x 64 图像(批量大小保持恒定为 32)时,出现以下错误.

<前>ValueError:预期输入 batch_size (128) 以匹配目标 batch_size (32).

堆栈跟踪从 loss = loss_fn(outputs, labels) 行开始.outputs.shape[128, 500]labels.shape[32].>

为了完整起见,此处列出了代码.

类单元(nn.Module):def __init__(self,in_channels,out_channels):super(Unit,self).__init__()self.conv = nn.Conv2d(in_channels=in_channels,kernel_size=3,out_channels=out_channels,stride=1,padding=1)self.bn = nn.BatchNorm2d(num_features=out_channels)self.relu = nn.ReLU()定义转发(自我,输入):输出 = self.conv(输入)输出 = self.bn(输出)输出 = self.relu(输出)返回输出类 SimpleNet(nn.Module):def __init__(self,num_classes=10):super(SimpleNet,self).__init__()self.unit1 = Unit(in_channels=3,out_channels=32)self.unit2 = Unit(in_channels=32, out_channels=32)self.unit3 = Unit(in_channels=32, out_channels=32)self.pool1 = nn.MaxPool2d(kernel_size=2)self.unit4 = Unit(in_channels=32, out_channels=64)self.unit5 = Unit(in_channels=64, out_channels=64)self.unit6 = Unit(in_channels=64, out_channels=64)self.unit7 = Unit(in_channels=64, out_channels=64)self.pool2 = nn.MaxPool2d(kernel_size=2)self.unit8 = Unit(in_channels=64, out_channels=128)self.unit9 = Unit(in_channels=128, out_channels=128)self.unit10 = Unit(in_channels=128, out_channels=128)self.unit11 = Unit(in_channels=128, out_channels=128)self.pool3 = nn.MaxPool2d(kernel_size=2)self.unit12 = Unit(in_channels=128, out_channels=128)self.unit13 = Unit(in_channels=128, out_channels=128)self.unit14 = Unit(in_channels=128, out_channels=128)self.avgpool = nn.AvgPool2d(kernel_size=4)self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, self.unit5, self.unit6,self.unit7, self.pool2, self.unit8, self.unit9, self.unit10, self.unit11, self.pool3,self.unit12, self.unit13, self.unit14, self.avgpool)self.fc = nn.Linear(in_features=128,out_features=num_classes)定义转发(自我,输入):输出 = self.net(输入)输出 = output.view(-1,128)输出 = self.fc(输出)返回输出

有关如何修改此 CNN 以接受并正确返回输出的任何想法?

解决方案

问题是最后的reshape(view)不兼容.

您正在使用一种扁平化"方式最后,这与全局池化"不同.两者都对 CNN 有效,但只有全局池与任何图像尺寸兼容.

扁平网(你的情况)

在您的情况下,使用展平,您需要跟踪所有图像尺寸,以便知道如何最终重塑.

所以:

  • 输入 64x64
  • 池 1 到 32x32
  • Pool2 到 16x16
  • Pool3 到 8x8
  • 平均池到 2x2

然后,最后你得到了 (batch, 128, 2, 2) 的形状.如果图像为 32x32,则为最终数字的四倍.

那么,你最终的重塑应该是output = output.view(-1,128*2*2).

这是一个具有不同分类层的不同网络,因为in_features=512.

全球汇集网络

另一方面,如果您将最后一个池化替换为全局池化,则您可以对任何图像大小 >= 32 使用相同的模型、相同的层和相同的权重:

def flatChannels(x):大小 = x.size()返回 x.view(size[0],size[1],size[2]*size[3])def globalAvgPool2D(x):返回 flatChannels(x).mean(dim=-1)def globalMaxPool2D(x):返回 flatChannels(x).max(dim=-1)

模型的结尾:

 #从这里移除池子放在前面self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4,self.unit5, self.unit6, self.unit7, self.pool2, self.unit8,self.unit9, self.unit10, self.unit11, self.pool3,self.unit12, self.unit13, self.unit14)self.fc = nn.Linear(in_features=128,out_features=num_classes)定义转发(自我,输入):输出 = self.net(输入)output = globalAvgPool2D(output) #or globalMaxPool2D输出 = self.fc(输出)返回输出

I took this convolutional neural network (CNN) from here. It accepts 32 x 32 images and defaults to 10 classes. However, I have 64 x 64 images with 500 classes. When I pass in 64 x 64 images (batch size held constant at 32), I get the following error.

ValueError: Expected input batch_size (128) to match target batch_size (32).

The stack trace starts at the line loss = loss_fn(outputs, labels). The outputs.shape is [128, 500] and the labels.shape is [32].

The code is listed here for completeness.

class Unit(nn.Module):
    def __init__(self,in_channels,out_channels):
        super(Unit,self).__init__()
        self.conv = nn.Conv2d(in_channels=in_channels,kernel_size=3,out_channels=out_channels,stride=1,padding=1)
        self.bn = nn.BatchNorm2d(num_features=out_channels)
        self.relu = nn.ReLU()

    def forward(self,input):
        output = self.conv(input)
        output = self.bn(output)
        output = self.relu(output)
        return output

class SimpleNet(nn.Module):
    def __init__(self,num_classes=10):
        super(SimpleNet,self).__init__()

        self.unit1 = Unit(in_channels=3,out_channels=32)
        self.unit2 = Unit(in_channels=32, out_channels=32)
        self.unit3 = Unit(in_channels=32, out_channels=32)

        self.pool1 = nn.MaxPool2d(kernel_size=2)

        self.unit4 = Unit(in_channels=32, out_channels=64)
        self.unit5 = Unit(in_channels=64, out_channels=64)
        self.unit6 = Unit(in_channels=64, out_channels=64)
        self.unit7 = Unit(in_channels=64, out_channels=64)

        self.pool2 = nn.MaxPool2d(kernel_size=2)

        self.unit8 = Unit(in_channels=64, out_channels=128)
        self.unit9 = Unit(in_channels=128, out_channels=128)
        self.unit10 = Unit(in_channels=128, out_channels=128)
        self.unit11 = Unit(in_channels=128, out_channels=128)

        self.pool3 = nn.MaxPool2d(kernel_size=2)

        self.unit12 = Unit(in_channels=128, out_channels=128)
        self.unit13 = Unit(in_channels=128, out_channels=128)
        self.unit14 = Unit(in_channels=128, out_channels=128)

        self.avgpool = nn.AvgPool2d(kernel_size=4)

        self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, self.unit5, self.unit6
                                 ,self.unit7, self.pool2, self.unit8, self.unit9, self.unit10, self.unit11, self.pool3,
                                 self.unit12, self.unit13, self.unit14, self.avgpool)

        self.fc = nn.Linear(in_features=128,out_features=num_classes)

    def forward(self, input):
        output = self.net(input)
        output = output.view(-1,128)
        output = self.fc(output)
        return output

Any ideas on how to modify this CNN to accept and properly return outputs?

解决方案

The problem is an incompatible reshape (view) at the end.

You're using a sort of "flattening" at the end, which is different from a "global pooling". Both are valid for CNNs, but only the global poolings are compatible with any image size.

The flattened net (your case)

In your case, with a flatten, you need to keep track of all image dimensions in order to know how to reshape at the end.

So:

  • Enter with 64x64
  • Pool1 to 32x32
  • Pool2 to 16x16
  • Pool3 to 8x8
  • AvgPool to 2x2

Then, at the end you've got a shape of (batch, 128, 2, 2). Four times the final number if the image were 32x32.

Then, your final reshape should be output = output.view(-1,128*2*2).

This is a different net with a different classification layer, though, because in_features=512.

The global pooling net

On the other hand, you could use the same model, same layers and same weights for any image size >= 32 if you replace the last pooling with a global pooling:

def flatChannels(x):
    size = x.size()
    return x.view(size[0],size[1],size[2]*size[3])

def globalAvgPool2D(x):        
    return flatChannels(x).mean(dim=-1)

def globalMaxPool2D(x):
    return flatChannels(x).max(dim=-1)

The ending of the model:

    #removed the pool from here to put it in forward
    self.net = nn.Sequential(self.unit1, self.unit2, self.unit3, self.pool1, self.unit4, 
                             self.unit5, self.unit6, self.unit7, self.pool2, self.unit8, 
                             self.unit9, self.unit10, self.unit11, self.pool3, 
                             self.unit12, self.unit13, self.unit14)

    self.fc = nn.Linear(in_features=128,out_features=num_classes)


def forward(self, input):
    output = self.net(input)
    output = globalAvgPool2D(output) #or globalMaxPool2D
    output = self.fc(output)
    return output

这篇关于如何修改此 PyTorch 卷积神经网络以接受 64 x 64 图像并正确输出预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆