Python-Pytesseract从图像中提取不正确的文本 [英] Python - Pytesseract extracts incorrect text from image

查看：442 发布时间：2020/5/20 21:08:16 python opencv image-processing opencv-python

本文介绍了Python-Pytesseract从图像中提取不正确的文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Python中使用以下代码从图像中提取文本，

I used the below code in Python to extract text from image,

import cv2
import numpy as np
import pytesseract
from PIL import Image

# Path of working folder on Disk
src_path = "<dir path>"

def get_string(img_path):
    # Read image with opencv
    img = cv2.imread(img_path)

    # Convert to gray
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Apply dilation and erosion to remove some noise
    kernel = np.ones((1, 1), np.uint8)
    img = cv2.dilate(img, kernel, iterations=1)
    img = cv2.erode(img, kernel, iterations=1)

    # Write image after removed noise
    cv2.imwrite(src_path + "removed_noise.png", img)

    #  Apply threshold to get image with only black and white
    #img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)

    # Write the image after apply opencv to do some ...

    cv2.imwrite(src_path + "thres.png", img)

    # Recognize text with tesseract for python
    result = pytesseract.image_to_string(Image.open(img_path))#src_path+ "thres.png"))

    # Remove template file
    #os.remove(temp)

    return result


print '--- Start recognize text from image ---'
print get_string(src_path + "test.jpg")

print "------ Done -------"

但是输出不正确.输入文件是

But the output is incorrect.. The input file is,

收到的输出是"0001"，而不是"D001"

The output received is '0001' instead of 'D001'

收到的输出是'3001'而不是'B001'

The output received is '3001' instead of 'B001'

从图像中检索正确的字符，以及训练pytesseract返回图像中所有字体类型的正确字符(包括粗体字符)所需的代码更改是什么

What is the required code changes to retrieve the right Characters from image, also to train the pytesseract to return the right characters for all font types in image[including Bold characters]

Python-Pytesseract从图像中提取不正确的文本 [英] Python - Pytesseract extracts incorrect text from image

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python-Pytesseract从图像中提取不正确的文本 [英] Python - Pytesseract extracts incorrect text from image

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭