阅读条形码pytesseract python下方的文本 [英] Read text below barcode pytesseract python

查看:27
本文介绍了阅读条形码pytesseract python下方的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取图像中条形码下方的数字.我已经对其他一些图像尝试了相同的代码并且工作正常但不适用于该图像这是图片

这是到目前为止的代码

def readNumber():图像 = cv2.imread(sTemp)灰色 = cv2.cvtColor(图像,cv2.COLOR_BGR2GRAY)模糊 = cv2.GaussianBlur(gray, (3,3), 0)thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]内核 = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))开场 = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, 内核, 迭代=1)反转 = 255 - 打开数据 = pytesseract.image_to_string(invert, lang='eng', config='--psm 6 -c tessedit_char_whitelist=0123456789')打印(数据)尝试:data = re.findall('(\d{9})\D', data)[0]除了:数据 = ''返回数据

我用这条线使用了它

readNumber()

这是另一个例子

这是我保证的最后一个例子

我在第三个例子中尝试了这个,它有效

img = cv2.imread("thisimage.png")模糊 = cv2.GaussianBlur(img, (3,3), 0)#gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)txt = pytesseract.image_to_string(模糊)打印(txt)

但是我如何采用所有案例来处理这三个案例?我试过这样的代码,但无法实现第三种情况

导入pytesseract、cv2、redef readNumber(img):img = cv2.imread(img)灰色 = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)尝试:txt = pytesseract.image_to_string(灰色)#txt = re.findall('(\d{9})\D', txt)[0]除了:thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 51, 4)txt = pytesseract.image_to_string(thr, config="digits")#txt = re.findall('(\d{9})\D', txt)[0]返回txt# M5Pr5 191876320# RWgrP 202131290# 6pVH4 193832560打印(读取编号('M5Pr5.png'))

解决方案

输入图像不需要任何预处理方法或配置.由于图像中没有伪影.

导入 cv2导入 pytesseractimg = cv2.imread(RWgrP.png")灰色 = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)txt = pytesseract.image_to_string(灰色)打印(txt)

结果:

202131290

我的pytesseract版本是4.1.1

更新 1


第二张图片需要

但输出也包含不需要的字符.因此,如果您将配置设置为数字,结果将是:

193832560

更新 2


对于第三张图片,您需要更改自适应方法,使用ADAPTIVE_THRESH_MEAN_C 将导致:

191876320

其他都一样.

代码:

导入 cv2导入 pytesseractimg = cv2.imread(6pVH4.png")灰色 = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 51, 4)txt = pytesseract.image_to_string(thr, config="digits")打印(txt)cv2.imshow(thr", thr)cv2.waitKey(0)

I am trying to get the number below a barcode in an image. I have tried the same code with some other images and works fine but not for that image Here's the image

And here is the code till now

def readNumber():
    image = cv2.imread(sTemp)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (3,3), 0)
    thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
    opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
    invert = 255 - opening
    data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6 -c tessedit_char_whitelist=0123456789')
    print(data)
    try:
        data  = re.findall('(\d{9})\D', data)[0]
    except:
        data = ''
    return data

And I used it using this line

readNumber()

Here's another example

This is the last example I promise

I tried this with the third example and it works

img = cv2.imread("thisimage.png")
blur = cv2.GaussianBlur(img, (3,3), 0)
#gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
txt = pytesseract.image_to_string(blur)
print(txt)

But how I adopt all the cases to work with the three cases? I tried such a code but couldn't implement the thrid case

import pytesseract, cv2, re

def readNumber(img):
    img = cv2.imread(img)
    gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    try:
        txt = pytesseract.image_to_string(gry)
        #txt  = re.findall('(\d{9})\D', txt)[0]
    except:
        thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 51, 4)
        txt = pytesseract.image_to_string(thr, config="digits")
        #txt  = re.findall('(\d{9})\D', txt)[0]

    return txt

# M5Pr5         191876320
# RWgrP         202131290
# 6pVH4         193832560
print(readNumber('M5Pr5.png'))

解决方案

You don't need any preprocessing methods or configuration for the input image. Since there is no artifacts in the image.

import cv2
import pytesseract

img = cv2.imread("RWgrP.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
txt = pytesseract.image_to_string(gry)
print(txt)

Result:

202131290

My pytesseract version is 4.1.1

Update-1


The second image requires preprocessing

If you apply adaptive-thresholding:

But the output also consists of unwanted characters. Therefore if you set the configuration to digits, the result will be:

193832560

Update-2


For the third image, you need to change the adaptive method, using ADAPTIVE_THRESH_MEAN_C will result in:

191876320

The rest are same.

Code:

import cv2
import pytesseract

img = cv2.imread("6pVH4.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 51, 4)
txt = pytesseract.image_to_string(thr, config="digits")
print(txt)
cv2.imshow("thr", thr)
cv2.waitKey(0)

这篇关于阅读条形码pytesseract python下方的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆