表单识别器速度问题 [英] Form Recognizer speed issues

查看：111 发布时间：2020/7/23 1:10:28 python azure-cognitive-services form-recognizer

本文介绍了表单识别器速度问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用带有标签的自定义模型(使用示例标签工具创建)，并从此

I'm using a custom model with labels (created with the sample labeling tool) and getting the results with the "Python Form Recognizer Async Analyze" V2 SDK Code from the bottom of this 1 page. It basicly works but it took over 20 seconds for a single page PDF file to get the results (6 labels used, S0 pricing model). 150 single page pdf files took over one hour. We also tested with the V1 SDK Preview Version (without labels) of the form recognizer which was significantly faster then V2.

我知道V2现在是异步的，但是有什么方法可以加快表单识别的速度吗? 下面是我基本使用的代码:

I know V2 is async now but is there anything which could be done to speed up form recognition? Below is the code i'm basicly using:

########### Python Form Recognizer Async Analyze #############
import json
import time
from requests import get, post

# Endpoint URL
endpoint = r"<endpoint>"
apim_key = "<subsription key>"
model_id = "<model_id>"
post_url = endpoint + "/formrecognizer/v2.0-preview/custom/models/%s/analyze" % model_id
source = r"<file path>"
params = {
    "includeTextDetails": True
}

headers = {
    # Request headers
    'Content-Type': '<file type>',
    'Ocp-Apim-Subscription-Key': apim_key,
}
with open(source, "rb") as f:
    data_bytes = f.read()

try:
    resp = post(url = post_url, data = data_bytes, headers = headers, params = params)
    if resp.status_code != 202:
        print("POST analyze failed:\n%s" % json.dumps(resp.json()))
        quit()
    print("POST analyze succeeded:\n%s" % resp.headers)
    get_url = resp.headers["operation-location"]
except Exception as e:
    print("POST analyze failed:\n%s" % str(e))
    quit() 

n_tries = 15
n_try = 0
wait_sec = 5
max_wait_sec = 60
while n_try < n_tries:
    try:
        resp = get(url = get_url, headers = {"Ocp-Apim-Subscription-Key": apim_key})
        resp_json = resp.json()
        if resp.status_code != 200:
            print("GET analyze results failed:\n%s" % json.dumps(resp_json))
            quit()
        status = resp_json["status"]
        if status == "succeeded":
            print("Analysis succeeded:\n%s" % json.dumps(resp_json))
            quit()
        if status == "failed":
            print("Analysis failed:\n%s" % json.dumps(resp_json))
            quit()
        # Analysis still running. Wait and retry.
        time.sleep(wait_sec)
        n_try += 1
        wait_sec = min(2*wait_sec, max_wait_sec)     
    except Exception as e:
        msg = "GET analyze results failed:\n%s" % str(e)
        print(msg)
        quit()
print("Analyze operation did not complete within the allocated time.")

表单识别器速度问题 [英] Form Recognizer speed issues

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

表单识别器速度问题 [英] Form Recognizer speed issues

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭