视频地面实况的最佳实践? [英] Best practise for video ground truthing?

查看:77
本文介绍了视频地面实况的最佳实践?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想训练一个深度学习框架(TensorFlow),以使用新的对象类别来进行对象检测。



作为地面实况调查的源,我有多个视频文件



我应该如何使视频真实化?即使那些视频帧非常相似,我也应该逐帧提取并标记每个帧吗?还是对这样的任务最好的做法是什么?



首选开源工具。

解决方案

它通常按照您描述的方式工作。至少对于零迭代


  1. 收集所需示例(视频)

  2. 从视频中提取有价值的帧(手动或部分自动化的过程)

  3. 使用OpenCV(或任何其他工具)提取所需的详细信息(边框,准确的蒙版)

  4. 组装训练集

  5. 训练模型

这里是通过上述方法制作的训练集的示例(



对于迭代一个,您可以使用零迭代模型,并显着改善第2步和第3步,从而进一步增加训练集。



我正在尝试解决几乎相同的问题,因为很难生成训练集来进行准确的细分:





(再次在这里起作用其他示例



基本上,从半手动方法开始,然后尝试发展。


I would like to train a deep learning framework (TensorFlow) for object detection with a new object category.

As source for the ground truthing I have multiple video files which contain the object (only part of the image contains the object).

How should I ground truth the video? Should I extract frame by frame and label every frame even when those video frames will be quite similar? Or what would be best practise for such a task?

Open source tools are preferred.

解决方案

It usually works as you described. At lest for the iteration zero:

  1. collect required examples (video)
  2. extract valuable frames from the video (manual or partially automated process)
  3. use OpenCV (or any other tool) to extract required details (bounding box, accurate mask)
  4. assemble a training set
  5. train a model

Here is an example of a training set, produced by the approach described above (see it in action)

For iteration one you might use iteration zero models and significantly improve step 2 and step 3 to increase the training set even more.

I'm trying to solve pretty much the same problem, because it is hard to produce a training set to get accurate segmentation:

(again, here it is in action and other examples)

Basically, start with a semi-manual approach and try to evolve.

这篇关于视频地面实况的最佳实践?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆