带有手写复选框的AWS Textract [英] AWS textract with hand-written checkboxes

查看:142
本文介绍了带有手写复选框的AWS Textract的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有1000份调查表,需要对其进行扫描,然后将其上载到C#系统中,以提取数据并将其输入数据库.调查是手写的1)文本框和2)复选框的组合.我目前正在使用Azure Read Api提取手写文本,该文本应该可以正常工作,例如下面的问题4返回"Python"和编码".

I have 1000s of survey forms which I need to scan and then upload onto my C# system in order to extract the data and enter it into a database. The surveys are a mix of hand-written 1) text boxes and 2) checkboxes. I am currently using the the Azure Read Api to extract hand-written text which should work fine e.g. question #4 below returns 'Python' and 'coding'.

所以我的问题; AWS Textract是否能让我提取标记有复选框的数据?例如请参阅下面的问题#1-我需要返回一个字符串,说不同意",任何AWS Textract API都有可能吗?

So my question; will any AWS Textract give me the capability to extract data for which checkbox is marked? e.g. see question #1 below - I need a string back saying 'disagree', is this possible with any AWS Textract API?

不幸的是,Azure Read API和Google Vision OCR没有提供此功能,因此,如果AWS Textract无法解决此问题,我将不得不执行一些手动操作,例如检查像素颜色的变化以检测被选中的复选框.

Azure Read API and Google Vision OCR do not offer this functionality unfortunately so if AWS Textract doesn't help me with this I will have to do something manual like checking changes in pixel color to detect ticked checkboxes.

调查类型:

推荐答案

是的,Amazon Textract支持检测各种字段输入,例如复选框和单选按钮. ="https://docs.aws.amazon.com/textract/latest/dg/how-it-works-selectables.html" rel ="nofollow noreferrer">在此处的文档中和此处.

Yes, Amazon Textract supports detection of various field inputs like checkboxes and radio buttons. You can read more about the details in the docs here and here.

我编写了一个快速脚本,使用以下代码为您的图像调用Textract,该代码可以正确识别不同表单字段的键和值,此外还可以识别给定的字段是选中还是未选中.

I wrote a quick script to call Textract for your image with the following code, which properly identified the keys and values for the different form fields, in addition to identifying whether a given field was selected/unselected.

# python 3
import boto3

# instantiate client
textract = boto3.client('textract')

# read image bytes
with open("textract-test.png", "rb") as image:
  f = image.read()
  image_data = bytearray(f)
  print(image_data[0])

# call textract endpoint
textract.analyze_document(Document={'Bytes': image_data}, FeatureTypes=['FORMS'])

结果输出将是一系列块",代表文本或表单输入的各个块.解析此JSON,我们可以找到与所选复选框相对应的块,类似于以下内容:

The resulting output will be a series of "blocks", which represent individual blocks of text or form inputs. Parsing this JSON, we can find blocks that correspond to selected checked boxes that resemble the following:

"Id": "0abb6f4e-4512-4581-b261-a45f2426973f",
      "SelectionStatus": "SELECTED" // value of interest. Alternatively, "NOT_SELECTED"
    },
    {
      "BlockType": "SELECTION_ELEMENT",
      "Confidence": 54.00064468383789,
      "Geometry": {
        "BoundingBox": {
          "Width": 0.030619779601693153,
          "Height": 0.024501724168658257,
          "Left": 0.4210366904735565,
          "Top": 0.439885675907135
        },
        "Polygon": [
          {
            "X": 0.4210366904735565,
            "Y": 0.439885675907135
          },
          {
            "X": 0.4516564607620239,
            "Y": 0.439885675907135
          },
          {
            "X": 0.4516564607620239,
            "Y": 0.4643873870372772
          },
          {
            "X": 0.4210366904735565,
            "Y": 0.4643873870372772
          }
        ]
      },

不使用C#编写示例的道歉,但是您可以通过 CLI AWS .NET SDK 可获得类似效果.

Apologies for not whipping up an example in C#, but you can leverage Textract via the CLI or the AWS .NET SDK for similar effects.

注意:如果您只是想了解一下Amazon Textract将对您的数据返回什么响应,则可以导航到AWS管理控制台中的Amazon Textract页面并使用该图像.在其中测试应用程序.您可以使用GUI可视化某些结果,或完整下载API响应.

Note: If you're looking to just get a feel for what response Amazon Textract will return for your data, you can navigate to the Amazon Textract page in the AWS Management Console and use the image test application in there. You can use the GUI to visualize some of the results, or download the API responses in their entirety.

这篇关于带有手写复选框的AWS Textract的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆