如何从 Pytorch 中的单个图像中提取特征向量? [英] How to extract feature vector from single image in Pytorch?

查看:29
本文介绍了如何从 Pytorch 中的单个图像中提取特征向量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试更多地了解计算机视觉模型,并且我正在尝试探索它们的工作原理.为了理解如何更多地解释特征向量,我尝试使用 Pytorch 来提取特征向量.下面是我从不同地方拼凑的代码.

I am attempting to understand more about computer vision models, and I'm trying to do some exploring of how they work. In an attempt to understand how to interpret feature vectors more I'm trying to use Pytorch to extract a feature vector. Below is my code that I've pieced together from various places.

import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torch.autograd import Variable
from PIL import Image



img=Image.open("Documents/01235.png")

# Load the pretrained model
model = models.resnet18(pretrained=True)

# Use the model object to select the desired layer
layer = model._modules.get('avgpool')

# Set model to evaluation mode
model.eval()

transforms = torchvision.transforms.Compose([
        torchvision.transforms.Resize(256),
        torchvision.transforms.CenterCrop(224),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    
def get_vector(image_name):
    # Load the image with Pillow library
    img = Image.open("Documents/Documents/Driven Data Competitions/Hateful Memes Identification/data/01235.png")
    # Create a PyTorch Variable with the transformed image
    t_img = transforms(img)
    # Create a vector of zeros that will hold our feature vector
    # The 'avgpool' layer has an output size of 512
    my_embedding = torch.zeros(512)
    # Define a function that will copy the output of a layer
    def copy_data(m, i, o):
        my_embedding.copy_(o.data)
    # Attach that function to our selected layer
    h = layer.register_forward_hook(copy_data)
    # Run the model on our transformed image
    model(t_img)
    # Detach our copy function from the layer
    h.remove()
    # Return the feature vector
    return my_embedding

pic_vector = get_vector(img)

当我这样做时,我收到以下错误:

When I do this I get the following error:

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 224, 224] instead

我确定这是一个基本错误,但我似乎无法弄清楚如何解决这个问题.我的印象是totensor"转换会使我的数据变成 4-d,但它似乎无法正常工作,或者我误解了它.感谢我可以用来了解更多相关信息的任何帮助或资源!

I'm sure this is an elementary error, but I can't seem to figure out how to fix this. It was my impression that the "totensor" transformation would make my data 4-d, but it seems it's either not working correctly or I'm misunderstanding it. Appreciate any help or resources I can use to learn more about this!

推荐答案

pytorch 中所有默认的 nn.Modules 都需要一个额外的批处理维度.如果模块的输入是形状 (B, ...),那么输出也将是 (B, ...)(尽管后面的维度可能会因层而异).这种行为允许同时对 B 输入的批次进行有效推理.要使您的代码符合要求,您只需 unsqueeze 在将 t_img 张量发送到您的模型以使其成为 (1, ...) 张量之前,在其前面添加一个额外的单一维度.您还需要flatten layer 在存储之前的输出,如果你想将它复制到你的一维 my_embedding 张量中.

All the default nn.Modules in pytorch expect an additional batch dimension. If the input to a module is shape (B, ...) then the output will be (B, ...) as well (though the later dimensions may change depending on the layer). This behavior allows efficient inference on batches of B inputs simultaneously. To make your code conform you can just unsqueeze an additional unitary dimension onto the front of t_img tensor before sending it into your model to make it a (1, ...) tensor. You will also need to flatten the output of layer before storing it if you want to copy it into your one-dimensional my_embedding tensor.

其他一些事情:

  • 您应该在 torch.no_grad() 上下文中进行推断以避免计算梯度,因为您不需要它们(注意 model.eval() 只是改变某些层的行为,如 dropout 和批量归一化,它不会禁用计算图的构建,但 torch.no_grad() 会.

  • You should infer within a torch.no_grad() context to avoid computing gradients since you won't be needing them (note that model.eval() just changes the behavior of certain layers like dropout and batch normalization, it doesn't disable construction of the computation graph, but torch.no_grad() does).

我认为这只是复制粘贴问题,但 transforms 是导入模块的名称以及全局变量.

I assume this is just a copy paste issue but transforms is the name of an imported module as well as a global variable.

o.data 只是返回一个 o 的副本.在旧的 Variable 接口(大约 PyTorch 0.3.1 及更早版本)中,这曾经是必要的,但 Variable 接口是 已弃用 回到 PyTorch 0.4.0 不再做任何有用的事情;现在它的使用只会造成混乱.不幸的是,许多教程仍在使用这个旧的和不必要的界面编写.

o.data is just returning a copy of o. In the old Variable interface (circa PyTorch 0.3.1 and earlier) this used to be necessary, but the Variable interface was deprecated way back in PyTorch 0.4.0 and no longer does anything useful; now its use just creates confusion. Unfortunately, many tutorials are still being written using this old and unnecessary interface.

更新后的代码如下:

import torch
import torchvision
import torchvision.models as models
from PIL import Image

img = Image.open("Documents/01235.png")

# Load the pretrained model
model = models.resnet18(pretrained=True)

# Use the model object to select the desired layer
layer = model._modules.get('avgpool')

# Set model to evaluation mode
model.eval()

transforms = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])


def get_vector(image):
    # Create a PyTorch tensor with the transformed image
    t_img = transforms(image)
    # Create a vector of zeros that will hold our feature vector
    # The 'avgpool' layer has an output size of 512
    my_embedding = torch.zeros(512)

    # Define a function that will copy the output of a layer
    def copy_data(m, i, o):
        my_embedding.copy_(o.flatten())                 # <-- flatten

    # Attach that function to our selected layer
    h = layer.register_forward_hook(copy_data)
    # Run the model on our transformed image
    with torch.no_grad():                               # <-- no_grad context
        model(t_img.unsqueeze(0))                       # <-- unsqueeze
    # Detach our copy function from the layer
    h.remove()
    # Return the feature vector
    return my_embedding


pic_vector = get_vector(img)

这篇关于如何从 Pytorch 中的单个图像中提取特征向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆