如何在 Google Cloud Vision API 中对属于较大句子的块进行分组? [英] How to group blocks that are part of a bigger sentences in Google Cloud Vision API?

查看:41
本文介绍了如何在 Google Cloud Vision API 中对属于较大句子的块进行分组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Python 上使用 Google Cloud Vision API 来检测通常位于商店/商店上方的囤积板中的文本值.到目前为止,我已经能够检测单个单词及其边界多边形的坐标.有没有办法根据检测到的词的相对位置和大小对其进行分组?

I am using Google Cloud Vision API on Python to detect text values in hoarding boards that are usually found above a shop/store. So far I have been able to detect individual words and their bounding polygons' coordinates. Is there a way to group the detected words based on their relative positions and sizes?

比如店铺的名字一般都是一样大小写,字是对齐的.API 是否提供了一些函数来对可能是更大句子(商店名称或地址等)的一部分的词进行分组?

For example, the name of the store is usually written in same size and the words are aligned. Does the API provide some functions that group those words which probably are parts of a bigger sentence (the store name, or the address, etc.)?

如果 API 不提供此类功能,那么将它们分组的好方法是什么?以下是我到目前为止所做的图像示例:

If the API does not provide such functions, what would be a good approach to group them? Following is an example of an image what I have done so far:

Vision API 输出摘录:

Vision API output excerpt:

description: "SHOP"
bounding_poly {
  vertices {
    x: 4713
    y: 737
  }
  vertices {
    x: 5538
    y: 737
  }
  vertices {
    x: 5538
    y: 1086
  }
  vertices {
    x: 4713
    y: 1086
  }
}
, description: "OVOns"
bounding_poly {
  vertices {
    x: 6662
    y: 1385
  }
  vertices {
    x: 6745
    y: 1385
  }
  vertices {
    x: 6745
    y: 1402
  }
  vertices {
    x: 6662
    y: 1402
  }
}

推荐答案

我建议你看看 TextAnnotation 响应格式,在使用 DOCUMENT_TEXT_DETECTION 进行 OCR 识别请求时应用.此响应包含有关图像元数据和文本内容值的详细信息,可用于按块、段落、单词等对文本进行分组,如公共文档中所述:

I suggest you to take a look on the TextAnnotation response format that is applied when using the DOCUMENT_TEXT_DETECTION for OCR recognition request. This responses contains detailed information about the image metadata and text content values that can be used to group the text by block, paragraph, word, etc, as described in the public documentation:

TextAnnotation 包含 OCR 提取文本的结构化表示.一个OCR提取的文本结构的层次是这样的:TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol

TextAnnotation contains a structured representation of OCR extracted text. The hierarchy of an OCR extracted text structure is like this: TextAnnotation -> Page -> Block -> Paragraph -> Word -> Symbol

此外,您可以遵循这个有用的示例可以通过处理 fullTextAnnotation 响应内容来组织从收据图像中提取的文本.

Additionally, you can follow this useful example where is shown how you can organize the text extracted from a receipt image by processing the fullTextAnnotation response content.

这篇关于如何在 Google Cloud Vision API 中对属于较大句子的块进行分组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆