如何在运行Tensorflow推理会话之前批处理多个视频帧 [英] How to Batch Multiple Videoframes before run Tensorflow Inference Session

查看:99
本文介绍了如何在运行Tensorflow推理会话之前批处理多个视频帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我做了一个基本上使用带有tensorflow的googles对象检测api的项目.

I made a project that basically uses googles object detection api with tensorflow.

我要做的只是推断出一个预先训练的模型:这意味着实时对象检测,其中输入是网络摄像头的视频流或使用OpenCV的类似视频.

All i am doing is inference with a pre-trained model: Which means realtime object detection where the Input is the Videostream of a webcam or something similar using OpenCV.

现在我获得了相当不错的性能结果,但是我想进一步提高FPS.

Right now i got pretty decent performance results, but i want to further increase the FPS.

因为我的经验是Tensorflow在推理时使用了我的全部内存,但是GPU使用率根本没有达到极限(使用NVIDIA GTX 1050笔记本电脑时约为40%,而使用NVIDIA Jetson Tx2时约为6%).

Because what i experience is that Tensorflow uses my whole Memory while Inference but the GPU Usage is not maxed out at all (around 40% with a NVIDIA GTX 1050 Laptop, and 6% on a NVIDIA Jetson Tx2).

所以我的想法是通过增加每次会话运行时提供的图像批处理大小来增加GPU使用率.

So my idea was to increase the GPU Usage by increasing the image batch size which is fed in each session run.

所以我的问题是:在将输入视频流的多个帧馈送到sess.run()之前,如何将它们分批处理?

So my question is: How can i Batch multiple Frames of the Input-Videostream together before i feed them to sess.run()?

在我的github存储库上查看我的代码object_detetection.py:( https://github.com/GustavZ/realtime_object_detection ).

Have a look at my code object_detetection.py on my github repo: (https://github.com/GustavZ/realtime_object_detection).

如果您能提供一些提示或代码实现,我将非常感谢!

I would be very thankful if you come up with some Hints or Code Implementations!

import numpy as np
import os
import six.moves.urllib as urllib
import tarfile
import tensorflow as tf
import cv2


# Protobuf Compilation (once necessary)
os.system('protoc object_detection/protos/*.proto --python_out=.')

from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
from stuff.helper import FPS2, WebcamVideoStream

# INPUT PARAMS
# Must be OpenCV readable
# 0 = Default Camera
video_input = 0
visualize = True
max_frames = 300 #only used if visualize==False
width = 640
height = 480
fps_interval = 3
bbox_thickness = 8

# Model preparation
# What model to download.
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = 'models/' + MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
LABEL_MAP = 'mscoco_label_map.pbtxt'
PATH_TO_LABELS = 'object_detection/data/' + LABEL_MAP
NUM_CLASSES = 90

# Download Model    
if not os.path.isfile(PATH_TO_CKPT):
    print('Model not found. Downloading it now.')
    opener = urllib.request.URLopener()
    opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
    tar_file = tarfile.open(MODEL_FILE)
    for file in tar_file.getmembers():
      file_name = os.path.basename(file.name)
      if 'frozen_inference_graph.pb' in file_name:
        tar_file.extract(file, os.getcwd())
    os.remove('../' + MODEL_FILE)
else:
    print('Model found. Proceed.')

# Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')

# Loading label map
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)

# Start Video Stream
video_stream = WebcamVideoStream(video_input,width,height).start()
cur_frames = 0
# Detection
with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
    # Definite input and output Tensors for detection_graph
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
    # Each box represents a part of the image where a particular object was detected.
    detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
    # Each score represent how level of confidence for each of the objects.
    # Score is shown on the result image, together with the class label.
    detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
    detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')
    # fps calculation
    fps = FPS2(fps_interval).start()
    print ("Press 'q' to Exit")
    while video_stream.isActive():
      image_np = video_stream.read()
      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image_np, axis=0)
      # Actual detection.
      (boxes, scores, classes, num) = sess.run(
          [detection_boxes, detection_scores, detection_classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
      # Visualization of the results of a detection.
      vis_util.visualize_boxes_and_labels_on_image_array(
          image_np,
          np.squeeze(boxes),
          np.squeeze(classes).astype(np.int32),
          np.squeeze(scores),
          category_index,
          use_normalized_coordinates=True,
          line_thickness=bbox_thickness)
      if visualize:
          cv2.imshow('object_detection', image_np)
          # Exit Option
          if cv2.waitKey(1) & 0xFF == ord('q'):
              break
      else:
          cur_frames += 1
          if cur_frames >= max_frames:
              break
      # fps calculation
      fps.update()

# End everything
fps.stop()
video_stream.stop()     
cv2.destroyAllWindows()
print('[INFO] elapsed time (total): {:.2f}'.format(fps.elapsed()))
print('[INFO] approx. FPS: {:.2f}'.format(fps.fps()))

推荐答案

好吧,我只是收集batch_size帧并提供它们:

Well, I'd just collect batch_size frames and feed them:

batch_size = 5
while video_stream.isActive():
  image_np_list = []
  for _ in range(batch_size):
      image_np_list.append(video_stream.read())
      fps.update()
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.asarray(image_np_list)
  # Actual detection.
  (boxes, scores, classes, num) = sess.run(
      [detection_boxes, detection_scores, detection_classes, num_detections],
      feed_dict={image_tensor: image_np_expanded})

  # Visualization of the results of a detection.
  for i in range(batch_size):
      vis_util.visualize_boxes_and_labels_on_image_array(
          image_np_expanded[i],
          boxes[i],
          classes[i].astype(np.int32),
          scores[i],
          category_index,
          use_normalized_coordinates=True,
          line_thickness=bbox_thickness)
          if visualize:
              cv2.imshow('object_detection', image_np_expanded[i])
              # Exit Option
              if cv2.waitKey(1) & 0xFF == ord('q'):
                  break

当然,如果要读取检测结果,则必须在此之后进行相关更改,因为它们现在将具有batch_size行.

Of course you'll have to make the relevant changes after that if you're reading the results from the detection, since they will now have batch_size rows.

但是要小心:在tensorflow 1.4(我认为)之前,对象检测API 仅支持批处理image_tensor中的大小为1 ,因此除非升级张量流,否则这将不起作用.

Be careful though: before tensorflow 1.4 (I think), the object detection API only supports batch size of 1 in image_tensor, so this will not work unless you upgrade your tensorflow.

还请注意,您得到的FPS将是平均值,但是同一批次中的帧实际上会比不同批次之间的帧时间更近(因为您仍然需要等待sess.run()完成).尽管两个连续帧之间的最大时间应该增加,但平均值仍应明显好于当前的FPS.

Also note that your resulting FPS will be an average, but that the frames in a same batch will actually be closer in time than between different batches (since you'll still need to wait for the sess.run() to finish). The average should still be significantly better than your current FPS, although the max time between two consecutive frames should increase.

如果您希望所有帧之间的间隔大致相同,我想您将需要更复杂的工具,例如多线程和排队:一个线程将从流中读取图像并将其存储在队列中,另一个线程一个人将他们从队列中取出并异步调用它们上的sess.run();它也可以告诉第一个线程根据其自身的计算能力加快或减慢速度.实施起来比较麻烦.

If you want your frames to all have roughly the same interval between them, I guess you'll need more sophisticated tools like multithreading and queueing: one thread would read the images from the stream and store them in a queue, the other one would take them from the queue and call sess.run() on them asynchronously; it could also tell the 1st thread to hurry up or slow down depending on its own computing capacity. This is trickier to implement.

这篇关于如何在运行Tensorflow推理会话之前批处理多个视频帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆