如何在 tensorflow 对象检测 API 中获取多个边界框坐标 [英] How to get the multiple bounding box coordinates in tensorflow object-detection API

查看:103
本文介绍了如何在 tensorflow 对象检测 API 中获取多个边界框坐标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想获取多个边界框坐标和每个边界框的类,并将其作为 JSON 文件返回.

I want to get the multiple bounding boxes co-ordinates and the class of each bounding box and return it as a JSON file.

当我从下面的代码打印 box[] 时,它的形状为 (1,300,4).box[] 中有 300 个坐标.但是我的预测图像上只有 2 个.我想要在我的图像上预测的边界框的坐标.

when I print boxes[] from the following code, It has a shape of (1,300,4). There are 300 coordinates in boxes[]. But there are only 2 on my predicted image. I want the coordinates of the bounding boxes which are predicted on my image.

此外,我们如何知道哪个边界框映射到图像中的哪个类别/类?

Also, how would we know which bounding box is mapped to which category/class in the image?

例如,假设我在图像中有一只狗和一个人,我怎么知道哪个边界框对应于狗类,哪个边界框对应于人类?box[] 为我们提供了一个形状为 (1,300,4) 的数组,但没有任何指示哪个边界框对应于图像中的哪个类.

for example, let's say I have a dog and a person in an image, how would I know which bounding box corresponds to the dog class and which one to the person class? The boxes[] give us an array of shape (1,300,4) without any indication of which bounding box corresponds to which class in the image.

我关注了这个回答使用阈值分数从 box[] 中的 300 个坐标中获取边界框坐标.

I followed this answer to get bounding box coordinates from the 300 coordinates in the boxes[] using a threshold score.

我已经尝试获得得分最高的边界框.但即使预测图像有多个边界框,它也只返回一个边界框.

I've tried getting the bounding box with the highest score. But it only returns a single bounding box even if the predicted image has multiple bounding boxes.

得分最高的边界框坐标甚至与预测图像上的边界框坐标都不匹配.如何获得预测图像上的边界框坐标?

The bounding box coordinates with the highest score doesn't even match the bounding box coordinates on the predicted Image. How do I get bounding box coordinates which are on my predicted image?

            vis_util.visualize_boxes_and_labels_on_image_array(
                image_np,
                np.squeeze(boxes),
                np.squeeze(classes).astype(np.int32),
                np.squeeze(scores),
                category_index,
                use_normalized_coordinates=True,
                line_thickness=8)
            im = Image.fromarray(image_np)

            true_boxes = boxes[0][scores[0]==scores.max()]    # Gives us the box with max score
            for i in range(true_boxes.shape[0]):   # rescaling the coordinates
                ymin = true_boxes[i,0]*height
                xmin = true_boxes[i,1]*width
                ymax = true_boxes[i,2]*height
                xmax = true_boxes[i,3]*width

我从上面的代码 xmin,ymin,xmax,ymax 得到的坐标(具有最大分数)与预测图像上的边界框坐标不完全匹配.它们相差几个像素.此外,即使预测图像有多个边界框和多个类(例如:一只狗和一个人),我也只得到一个边界框.

The coordinates I get from the above code xmin,ymin,xmax,ymax (which has the max score) doesn't exactly match the bounding box coordinates on the predicted image. They are off by a few pixels. Also, I only get one bounding box even though the predicted image has multiple bounding boxes and multiple classes (ex: A dog and a person).

我想返回一个 JSON 文件,其中包含与每个边界框对应的 image_name、bounding_boxes 和类.

I would like to return a JSON file with the image_name,bounding_boxes, and class corresponding to each bounding box.

谢谢,我是新手.如果您没有理解问题的任何部分,请询问.

Thanks, I'm new to this. Please ask if you didn't understand any part of the question.

推荐答案

我在这里关注了这个答案 链接,我找到了我所有的边界框坐标:

I followed this answer here link and I found all of my bounding box coordinates:

min_score_thresh=0.60
true_boxes = boxes[0][scores[0] > min_score_thresh]
for i in range(true_boxes.shape[0]):
    ymin = int(true_boxes[i,0]*height)
    xmin = int(true_boxes[i,1]*width)
    ymax = int(true_boxes[i,2]*height)
    xmax = int(true_boxes[i,3]*width)

    roi = image[ymin:ymax,xmin:xmax].copy()
    cv2.imwrite("box_{}.jpg".format(str(i)), roi)

这篇关于如何在 tensorflow 对象检测 API 中获取多个边界框坐标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆