为什么GCP Vision API在python中返回比在线演示更差的结果 [英] Why GCP Vision API returns worse results in python than at its online demo

查看:408
本文介绍了为什么GCP Vision API在python中返回比在线演示更差的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个基本的python脚本来调用和使用GCP Vision API.我的目的是向其发送产品图像,并检索(使用OCR)此框中写的文字.我有一个预定义的品牌列表,因此我可以在API返回的文本中搜索该品牌并检测其含义.

I wrote a basic python script to call and use the GCP Vision API. My aim is to send an image of a product to it and to retrieve (with OCR) the words written on this box. I have a predefined list of brands so I can search within the returned text from the API the brand and detect what it is.

我的python脚本如下:

My python script is the following:

import  io
from google.cloud import vision
from google.cloud.vision import types
import os
import cv2
import numpy as np

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "**************************"


def detect_text(file):
    """Detects text in the file."""
    client = vision.ImageAnnotatorClient()

    with io.open(file, 'rb') as image_file:
        content = image_file.read()

    image = types.Image(content=content)

    response = client.text_detection(image=image)
    texts = response.text_annotations
    print('Texts:')

    for text in texts:
        print('\n"{}"'.format(text.description))

        vertices = (['({},{})'.format(vertex.x, vertex.y)
                    for vertex in text.bounding_poly.vertices])

        print('bounds: {}'.format(','.join(vertices)))


file_name = "Image.jpg"
img = cv2.imread(file_name)

detect_text(file_name)

目前,我正在尝试以下产品图片:(951××335分辨率)

For now, I am experimenting with the following product image: (951 × 335 resolution)

它的品牌是Acuvue.

问题如下.当我测试GCP Cloud Vision API的在线演示时,此图像得到以下文本结果:

The problem is the following. When I am testing the online demo of GCP Cloud Vision API then I am getting the following text result for this image:

FOR ASTIGMATISM 1-DAY ACUVUE MOIST WITH LACREON™ 30 Lenses BRAND CONTACT LENSES UV BLOCKING

(此json结果返回上述所有单词,包括单词Acuvue对我来说很重要,但json太长了,无法在此处发布)

(The json result for this returns all the above words including the word Acuvue which matters for me but the json is too long to post it here)

因此,在线演示很好地检测了产品上的文字,并且至少它可以准确地检测到单词Acuvue(这是品牌).但是,当我在python脚本中使用相同的图像调用相同的API时,会得到以下结果:

Therefore, the online demo detects pretty well the text on the product and at least it detects accurately the word Acuvue (which is the brand). However, when I am calling the same API in my python script with the same image I am getting the following result:

Texts:

"1.DAY
FOR ASTIGMATISM
WITH
LACREONTM
MOIS
30 Lenses
BRAND CONTACT LENSES
UV BLOCKING
"
bounds: (221,101),(887,101),(887,284),(221,284)

"1.DAY"
bounds: (221,101),(312,101),(312,125),(221,125)

"FOR"
bounds: (622,107),(657,107),(657,119),(622,119)

"ASTIGMATISM"
bounds: (664,107),(788,107),(788,119),(664,119)

"WITH"
bounds: (614,136),(647,136),(647,145),(614,145)

"LACREONTM"
bounds: (600,151),(711,146),(712,161),(601,166)

"MOIS"
bounds: (378,162),(525,153),(528,200),(381,209)

"30"
bounds: (614,177),(629,178),(629,188),(614,187)

"Lenses"
bounds: (634,178),(677,180),(677,189),(634,187)

"BRAND"
bounds: (361,210),(418,210),(418,218),(361,218)

"CONTACT"
bounds: (427,209),(505,209),(505,218),(427,218)

"LENSES"
bounds: (514,209),(576,209),(576,218),(514,218)

"UV"
bounds: (805,274),(823,274),(823,284),(805,284)

"BLOCKING"
bounds: (827,276),(887,276),(887,284),(827,284)

但这并没有像演示中那样完全检测到"Acuvue"一词!

But this does not detect at all the word "Acuvue" as the demo does!!

为什么会这样?

我可以修复python脚本中的某些内容以使其正常工作吗?

Can I fix something in my python script to make it work properly?

推荐答案

从文档中:

Vision API可以检测并提取图像中的文本.有两种支持OCR的注释功能:

The Vision API can detect and extract text from images. There are two annotation features that support OCR:

  • TEXT_DETECTION检测并提取任何图像中的文本.例如,一张照片可能包含路牌或交通标志. JSON包含提取的整个字符串,单个单词及其边界框.

  • TEXT_DETECTION detects and extracts text from any image. For example, a photograph might contain a street sign or traffic sign. The JSON includes the entire extracted string, as well as individual words, and their bounding boxes.

DOCUMENT_TEXT_DETECTION也从图像中提取文本,但是响应针对密集文本和文档进行了优化. JSON包含页面,块,段落,单词和中断信息.)

DOCUMENT_TEXT_DETECTION also extracts text from an image, but the response is optimized for dense text and documents. The JSON includes page, block, paragraph, word, and break information.)

我希望Web API实际上是在使用后者,然后根据置信度过滤结果.

My hope was that the web API was actually using the latter, and then filtering the results based on the confidence.

DOCUMENT_TEXT_DETECTION响应包括其他布局信息,例如页面,块,段落,单词和分隔符信息,以及每个页面的置信度得分.

A DOCUMENT_TEXT_DETECTION response includes additional layout information, such as page, block, paragraph, word, and break information, along with confidence scores for each.

无论如何,我一直希望(并且根据我的经验)后一种方法将更加努力"以找到所有字符串.

At any rate, I was hoping (and my experience has been) that the latter method would "try harder" to find all the strings.

我认为您没有做任何错误"的事情.只有两种并行检测方法.其中一个(DOCUMENT_TEXT_DETECTION)更加密集,针对文档进行了优化(可能针对直线,对齐和等距的行)进行了优化,并提供了某些应用程序可能不需要的更多信息.

I don't think you were doing anything "wrong". There are just two parallel detection methods. One (DOCUMENT_TEXT_DETECTION) is more intense, optimized for documents (likely for straightened, aligned and evenly spaced lines), and gives more information that might be unnecessary for some applications.

所以我建议您按照Python

So I suggested you modify your code following the Python example here.

最后,我想您要问的\342\204\242是转义的八进制值,对应于它在尝试识别™符号时认为找到的utf-8字符.

Lastly, my guess is that the \342\204\242 you ask about are escaped octal values corresponding to utf-8 characters it thinks it found when trying to identify the ™ symbol.

如果使用以下代码段:

b = b"\342\204\242"
s = b.decode('utf8')
print(s)

您会很高兴看到它会打印™.

You'll be happy to see that it prints ™.

这篇关于为什么GCP Vision API在python中返回比在线演示更差的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆