加速和了解Python Keras预测方法结果分析 [英] Speeding up and understanding Python Keras predict method results analysis

查看:55
本文介绍了加速和了解Python Keras预测方法结果分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Keras和Tensorflow使用Yolov3标准以及Yolov3-Tiny(快约10倍)执行对象检测.一切正常,但性能相当差,我在GPU上每2秒获得一帧,在CPU上每4秒获得约一帧.在分析代码时,事实证明 decode_netout 方法要花费很多时间.我通常遵循本教程作为一个例子.

I'm using Keras and Tensorflow to perform object detection using Yolov3 standard as well as Yolov3-Tiny (about 10x faster). Everything is working but performance is fairly poor, I'm getting about one frame every 2 seconds on the GPU and one frame every 4 seconds or so on the CPU. In profiling the code, it turns out the decode_netout method is taking a lot of time. I was generally following this tutorial as an example.

  1. 有人可以帮我了解它的工作吗?
  2. Tensorflow(或其他库)中是否存在可以进行这些计算的替代方法?例如,我将一些自定义Python换成了 tf.image.non_max_suppression ,它在性能方面大有帮助.
  1. Can someone help walk me through what it's doing?
  2. Are there alternative methods baked into Tensorflow (or other libraries) that could do these calculations? I swapped out some custom Python for tf.image.non_max_suppression for example and it helped out quite a bit in terms of performance.

# https://keras.io/models/model/
yhat = model.predict(image, verbose=0, use_multiprocessing=True)
# define the probability threshold for detected objects
class_threshold = 0.6
boxes = list()
for i in range(len(yhat)):
    # decode the output of the network
    boxes += detect.decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)

def decode_netout(netout, anchors, obj_thresh, net_h, net_w):
    grid_h, grid_w = netout.shape[:2]
    nb_box = 3
    netout = netout.reshape((grid_h, grid_w, nb_box, -1))
    boxes = []
    netout[..., :2]  = _sigmoid(netout[..., :2])
    netout[..., 4:]  = _sigmoid(netout[..., 4:])
    netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]
    netout[..., 5:] *= netout[..., 5:] > obj_thresh

    for i in range(grid_h*grid_w):
        row = i / grid_w
        col = i % grid_w
        for b in range(nb_box):
            # 4th element is objectness score
            objectness = netout[int(row)][int(col)][b][4]
            if(objectness.all() <= obj_thresh): continue
            # first 4 elements are x, y, w, and h
            x, y, w, h = netout[int(row)][int(col)][b][:4]
            x = (col + x) / grid_w # center position, unit: image width
            y = (row + y) / grid_h # center position, unit: image height
            w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
            h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
            # last elements are class probabilities
            classes = netout[int(row)][col][b][5:]
            box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
            boxes.append(box)
    return boxes

推荐答案

我使用GPU进行了类似的设置,并且遇到了相同的问题.我一直在研究YoloV3 Keras项目,并且在过去的两个星期中一直在追踪确切的问题.在对所有功能进行最后的时间排序之后,我发现将问题缩小为"def do_nms",然后将其引导至您在"def encode_netout"上方发布的功能.问题是非最大抑制速度很慢.

I have a similar setup with a GPU and have been facing the same problem. I have been working on a YoloV3 Keras project and have been chasing exact issue for past 2 weeks . After finally timeboxing all my functions I found narrowed down the issue to 'def do_nms' which then lead me to the function you have posted above 'def decode_netout'. The issue is that the Non-Max-Suppression is slow.

我找到的解决方案是调整这条线

The solution I found was adjusting this line

if(objectness.all() <= obj_thresh): continue

if (objectness <= obj_thresh).all(): continue

性能差异是昼夜的.我正在接近30 FPS,并且一切工作都好得多.

The performance difference is night and day. I am pushing near 30 FPS and everything is working much better.

信用转到此Git问题/解决方案:

Credit goes to this Git issue/solution:

https://github.com/experiencor/keras-yolo3/issues/177

我花了一些时间才弄清楚这一点,所以我希望这对其他人有帮助.

It took me a while to figure this out, so I hope this helps others.

这篇关于加速和了解Python Keras预测方法结果分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆