视频地面实况的最佳实践? [英] Best practise for video ground truthing?
问题描述
我想训练一个深度学习框架(TensorFlow),以使用新的对象类别来进行对象检测。
作为地面实况调查的源,我有多个视频文件
我应该如何使视频真实化?即使那些视频帧非常相似,我也应该逐帧提取并标记每个帧吗?还是对这样的任务最好的做法是什么?
首选开源工具。
它通常按照您描述的方式工作。至少对于零迭代:
- 收集所需示例(视频)
- 从视频中提取有价值的帧(手动或部分自动化的过程)
- 使用OpenCV(或任何其他工具)提取所需的详细信息(边框,准确的蒙版)
- 组装训练集
- 训练模型
这里是通过上述方法制作的训练集的示例(
对于迭代一个,您可以使用零迭代模型,并显着改善第2步和第3步,从而进一步增加训练集。
我正在尝试解决几乎相同的问题,因为很难生成训练集来进行准确的细分:
基本上,从半手动方法开始,然后尝试发展。
I would like to train a deep learning framework (TensorFlow) for object detection with a new object category.
As source for the ground truthing I have multiple video files which contain the object (only part of the image contains the object).
How should I ground truth the video? Should I extract frame by frame and label every frame even when those video frames will be quite similar? Or what would be best practise for such a task?
Open source tools are preferred.
It usually works as you described. At lest for the iteration zero:
- collect required examples (video)
- extract valuable frames from the video (manual or partially automated process)
- use OpenCV (or any other tool) to extract required details (bounding box, accurate mask)
- assemble a training set
- train a model
Here is an example of a training set, produced by the approach described above (see it in action)
For iteration one you might use iteration zero models and significantly improve step 2 and step 3 to increase the training set even more.
I'm trying to solve pretty much the same problem, because it is hard to produce a training set to get accurate segmentation:
(again, here it is in action and other examples)
Basically, start with a semi-manual approach and try to evolve.
这篇关于视频地面实况的最佳实践?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!