当图像大小不同时,如何格式化用于训练/预测的图像数据? [英] how to format the image data for training/prediction when images are different in size?
问题描述
我正在尝试训练对图像进行分类的模型.我的问题是,它们有不同的尺寸.我应该如何格式化我的图像/或模型架构?
I am trying to train my model which classifies images. The problem I have is, they have different sizes. how should i format my images/or model architecture ?
推荐答案
你没有说你在谈论什么架构.既然您说要对图像进行分类,我假设它是一个部分卷积、部分完全连接的网络,如 AlexNet、GoogLeNet 等.一般而言,您的问题的答案取决于您使用的网络类型.
You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.
例如,如果您的网络仅包含卷积单元——也就是说,不包含完全连接的层——它可以对输入图像的大小保持不变.这样的网络可以处理输入图像,然后返回另一个图像(一路卷积");您必须确保输出符合您的预期,因为您当然必须以某种方式确定损失.
If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.
但是,如果您使用的是完全连接的单元,那么您就会遇到麻烦:在这里,您的网络必须使用固定数量的学习权重,因此不同的输入将需要不同数量的权重 - 这是不可能的.
If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.
如果这是您的问题,您可以执行以下操作:
If that is your problem, here's some things you can do:
- 不要在意压缩图像.无论如何,网络可能会学会理解内容;无论如何,比例和视角对内容有什么意义吗?
- 将图像居中裁剪为特定尺寸.如果您担心丢失数据,请进行多次裁剪并使用这些裁剪来扩充您的输入数据,以便将原始图像分成
N
幅正确大小的不同图像. - 用纯色填充图像至平方大小,然后调整大小.
- 结合使用.
- Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
- Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into
N
different images of correct size. - Pad the images with a solid color to a squared size, then resize.
- Do a combination of that.
填充选项可能会为网络的预测引入额外的误差源,因为网络可能(阅读:可能会)偏向于包含这种填充边框的图像.如果您需要一些想法,请查看 TensorFlow 文档的 Images 部分,有像 resize_image_with_crop_or_pad
这样的片段,可以带走更大的工作.
The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border.
If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad
that take away the bigger work.
至于不关心压缩,这里是著名的 Inception 网络的一段预处理管道:
As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:
# This resizing operation may distort the images because the aspect
# ratio is not respected. We select a resize method in a round robin
# fashion based on the thread number.
# Note that ResizeMethod contains 4 enumerated resizing methods.
# We select only 1 case for fast_mode bilinear.
num_resize_cases = 1 if fast_mode else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [height, width], method=method),
num_cases=num_resize_cases)
他们完全意识到这一点并且无论如何都会去做.
They're totally aware of it and do it anyway.
根据您想要或需要走多远,实际上有一篇论文这里称为用于视觉识别的深度卷积网络中的空间金字塔池,通过以一种非常特殊的方式处理任意大小的输入.
Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.
这篇关于当图像大小不同时,如何格式化用于训练/预测的图像数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!