使用新图像更新 Tensorflow 对象检测模型 [英] Updating Tensorflow Object detection model with new images

查看:52
本文介绍了使用新图像更新 Tensorflow 对象检测模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用 Tensorflow 的对象检测 Api 使用自定义数据集训练了一个更快的 rcnn 模型.随着时间的推移,我想继续使用其他图像(每周收集)更新模型.目标是优化准确性并随着时间的推移对新图像进行加权.

I have trained a faster rcnn model with a custom dataset using Tensorflow's Object Detection Api. Over time I would like to continue to update the model with additional images (collected weekly). The goal is to optimize for accuracy and to weight newer images over time.

这里有一些替代方案:

  1. 将图像添加到之前的数据集并训练一个全新的模型
  2. 将图像添加到之前的数据集并继续训练之前的模型
  3. 仅包含新图像的新数据集并继续训练之前的模型

以下是我的想法:选项 1:会更耗时,但所有图像都将被平等"对待.

Here are my thoughts: option 1: would be more time consuming, but all images would be treated "equally".

选项 2:希望减少额外的训练时间,但一个问题是算法可能会对较早的图像进行更多加权.

Option 2: would like take less additional training time, but one concern is that the algorithm might be weighting the earlier images more.

选项 3:这似乎是最好的选择.采用原始模型,只需专注于训练新东西.

Option 3: This seems like the best option. Take original model and simply focus on training the new stuff.

其中一个明显更好吗?每种方法的优缺点是什么?

Is one of these clearly better? What would be the pros/cons of each?

此外,我想知道是保留一个测试集作为准确性的控制还是每次都创建一个包含更新图像的新测试集更好.也许将新图像的一部分添加到模型中,将另一部分添加到测试集,然后将旧的测试集图像送回模型(或将它们扔掉)?

In addition, I'd like to know if it's better to keep one test set as a control for accuracy or to create a new one each time that includes newer images. Perhaps adding some portion of new images to model and another to the test set, and then feeding older test set images back into model (or throwing them out)?

推荐答案

考虑您的数据集几乎完美的情况.如果您在新图像(每周收集)上运行模型,那么结果(即带有分数的框)将正是您想要从模型中获得的结果,将这些添加到数据集中毫无意义,因为模型不会学习任何新内容.

Consider the case where your dataset is nearly perfect. If you ran the model on new images (collected weekly), then the results (i.e. boxes with scores) would be exactly what you want from the model and it would be pointless adding these to the dataset because the model would not be learning anything new.

对于不完美的数据集,新图像的结果会显示(一些)错误,这些错误适合进一步训练.但是数据集中可能已经存在坏"图像,最好删除这些图像.这表明选项 1 必须按某个时间表发生,以完全消除不良"图像的影响.

For the imperfect dataset, results from new images will show (some) errors and these are appropriate for further training. But there may be "bad" images already in the dataset and it is desirable to remove these. This indicates that Option 1 must occur, on some schedule, to remove entirely the effect of "bad" images.

在较短的时间表上,如果新图像在域类别之间合理平衡(从某种意义上说是先前数据集的代表性子集),则选项 3 是合适的.

On a shorter schedule, Option 3 is appropriate if the new images are reasonably balanced across the domain categories (in some sense a representative subset of the previous dataset).

选项 2 看起来很安全,也更容易理解.当您说算法可能会更多地加权较早的图像"时,如果较早的图像好",我不明白为什么这是一个问题.但是,我可以看到域可能会随着时间的推移而改变(演变),在这种情况下,您可能希望平衡旧图像.我知道您可以修改训练数据来做到这一点,如本问题所述:

Option 2 seems pretty safe and is easier to understand. When you say "the algorithm might be weighting the earlier images more", I don't see why this is a problem if the earlier images are "good". However, I can see that the domain may change over time (evolution) in which case you may well wish to counter-weight older images. I understand that you can modify the training data to do just that as discussed in this question:

TensorFlow 对象检测中平衡数据的类权重应用程序接口

这篇关于使用新图像更新 Tensorflow 对象检测模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆