关于yolov2中的损失函数有疑问吗? [英] Questions about loss function in yolov2?

查看:192
本文介绍了关于yolov2中的损失函数有疑问吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我阅读了yolov2的实现.我对它的丢失有一些疑问.下面是损失函数的伪代码,我希望我做对了.

costs = np.zeros(output.shape)
for pred_box in all prediction box:  
    if (max iou pred_box has with all truth box < threshold):
        costs[pred_box][obj] = (sigmoid(obj)-0)^2 * 1
    else:
        costs[pred_box][obj] = 0
    costs[pred_box][x] = (sigmoid(x)-0.5)^2 * 0.01  
    costs[pred_box][y] = (sigmoid(y)-0.5)^2 * 0.01  
    costs[pred_box][w] = (w-0)^2 * 0.01  
    costs[pred_box][h] = (h-0)^2 * 0.01  
for truth_box all ground truth box:  
    pred_box = the one prediction box that is supposed to predict for truth_box
    costs[pred_box][obj] = (1-sigmoid(obj))^2 * 5  
    costs[pred_box][x] = (sigmoid(x)-truex)^2 * (2- truew*trueh/imagew*imageh)  
    costs[pred_box][y] = (sigmoid(y)-truey)^2 * (2- truew*trueh/imagew*imageh)  
    costs[pred_box][w] = (w-log(truew/anchorw))^2 * (2- truew*trueh/imagew*imageh)  
    costs[pred_box][h] = (h-log(trueh/anchorh))^2 * (2- truew*trueh/imagew*imageh)  
    costs[pred_box][classes] = softmax_euclidean  
total loss = sum(costs)

我对此有一些疑问:

1.代码每10批次随机将火车图像的大小调整为320至608之间的大小,但是锚框没有相应地调整大小.为什么不也调整锚点大小呢?我的意思是您选择了一组最常用的锚点在13 * 13的特征图中,这些锚点在19 * 19的特征图中不常见,因此为什么不根据图像大小调整锚的大小.

2.正在对未分配真值的框的x,y,w,h预测应用成本,这默认推w,h以完全适合锚点,而x,y在默认情况下位于单元格中心,这很有帮助为什么?为什么不将位置预测成本仅应用于分配了真相的位置预测,而忽略未分配的位置.

3.为什么不简单地将(obj-0)^ 2作为未分配真值的所有框的obj预测的成本.在yolov2中,未分配真值的框的obj预测并非全部应用的成本,而是那些没有真值的框分配了真相,并且与所有真相没有太多重叠,并且是实际成本.为什么会这样,很复杂.

解决方案

1

在YOLOv2的实现中,随机裁剪用于扩充训练数据.随机裁剪会裁剪图像的一部分并将其扩展为与原始图像相同的尺寸.

训练数据的这种增加使训练后的网络具有在训练数据中未曾见过的不同大小的对象的鲁棒性.因此,不应在此过程中更改锚框.

请记住,在训练和预测之前,锚定框是对物体形状的假设.但是,如果网络做出这样的假设,则形状与假设有很大不同的对象将变得不那么健壮.数据扩充解决了这个问题.

2

这是因为我们不知道中心坐标和盒子形状的真相.培训YOLO时,我们使用的是负责任的盒子.它们是要在培训过程中更新的方框.

请参阅 我的中号帖子的负责任的边界框".. >

3 这是因为YOLO的输出来自卷积层的目录,而不是来自完全连接的激活的目录.因此输出不受限制在0到1之间.因此,我们采用S形函数来表示概率.

I read the yolov2 implementation.I have some questions about it's loss.Below is the pseudo code of the loss function, i hope i got it right.

costs = np.zeros(output.shape)
for pred_box in all prediction box:  
    if (max iou pred_box has with all truth box < threshold):
        costs[pred_box][obj] = (sigmoid(obj)-0)^2 * 1
    else:
        costs[pred_box][obj] = 0
    costs[pred_box][x] = (sigmoid(x)-0.5)^2 * 0.01  
    costs[pred_box][y] = (sigmoid(y)-0.5)^2 * 0.01  
    costs[pred_box][w] = (w-0)^2 * 0.01  
    costs[pred_box][h] = (h-0)^2 * 0.01  
for truth_box all ground truth box:  
    pred_box = the one prediction box that is supposed to predict for truth_box
    costs[pred_box][obj] = (1-sigmoid(obj))^2 * 5  
    costs[pred_box][x] = (sigmoid(x)-truex)^2 * (2- truew*trueh/imagew*imageh)  
    costs[pred_box][y] = (sigmoid(y)-truey)^2 * (2- truew*trueh/imagew*imageh)  
    costs[pred_box][w] = (w-log(truew/anchorw))^2 * (2- truew*trueh/imagew*imageh)  
    costs[pred_box][h] = (h-log(trueh/anchorh))^2 * (2- truew*trueh/imagew*imageh)  
    costs[pred_box][classes] = softmax_euclidean  
total loss = sum(costs)

I have some questions about that :

1.The code randomly resize the train images to dimensions between 320 and 608 every 10 batch,but the anchor box isn't resized accordingly.why not resize the anchor size too.I mean you selected a set of most common anchors in a 13*13 feature map,those anchors won't be common in a 19*19 feature map,so why not resize anchor according to image size.

2.Is applying cost for x,y,w,h prediction of boxes that isn't assigned a truth,which pushes w,h to exactly fit the anchor and x,y to center in the cell by default ,helpful and why is that.Why not apply cost of location prediction only to the ones assigned a truth and ignore unassigned ones.

3.Why not simply apply (obj-0)^2 as cost of obj prediction of all boxes with no truth assigned.In yolov2,obj prediction for boxes with no truth assigned are not all applied cost,only those with no truth assigned and don't overlap much with all truth and are applied cost. Why is that ,it's complicated.

解决方案

1

In the implementation of YOLOv2, Random Cropping is used to augment the training data. Random cropping crops a part of image and expand it such that it has the same size as the original one.

This augmentation of training data makes the trained network robust with different sizes of object that it had not seen in the training data. So the anchor boxes should not be changed through this process.

Remember that anchor boxes are assumptions on the shape of the objects which is fed before training and prediction. But if the network puts some assumption like this, it becomes non-robust with objects that have shapes much different from the assumption. Data augmentation addresses this problem.

2

This is because we don't know the truth for the center coordinates and the box shape. When we train YOLO, we use the concept Responsible Boxes. They are boxes that are to be updated through the training process.

Please see the section " ‘Responsible’ Bounding Boxes" of my Medium post.

3 This is because the output of YOLO comes directory from a convolutional layer, not from an activation of fully connected. Thus the output is not restricted between 0 and 1. So we apply sigmoid function such that it represents a probability.

这篇关于关于yolov2中的损失函数有疑问吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆