OpenCV-Python中的简单数字识别OCR [英] Simple Digit Recognition OCR in OpenCV-Python

查看:3504
本文介绍了OpenCV-Python中的简单数字识别OCR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在OpenCV-Python(cv2)中实现一个数字识别OCR。它只是为了学习的目的。我想学习OpenCV中的KNearest和SVM特性。



我有100个样本(即图像)的每个数字。我想和他们一起训练。



有一个样例 letter_recog.py 随OpenCV示例。但我仍然不能弄清楚如何使用它。我不明白什么是样品,响应等。此外,它加载了一个txt文件,我不知道首先。



稍后搜索一点点,我可以在cpp示例中找到一个letter_recognition.data。我使用它,并在letter_recog.py模型中为cv2.KNearest编写代码(仅用于测试):

  import numpy as np 
import cv2

fn ='letter-recognition.data'
a = np.loadtxt(fn,np.float32,delimiter =',',converters = {0 :lambda ch:ord(ch)-ord('A')})
samples,responses = a [:,1:],a [:,0]

model = cv2 .KNearest()
retval = model.train(samples,responses)
retval,results,neigh_resp,dists = model.find_nearest(samples,k = 10)
print results.ravel

它给我一个大小为20000的数组,我不明白是什么。



问题:



1)什么是letter_recognition.data文件?如何从我自己的数据集构建该文件?



2) results.reval()



3)我们如何使用letter_recognition.data文件(KNearest或SVM)编写一个简单的数字识别工具?

解决方案

好吧,我决定自己锻炼我的问题,以解决上述问题。我想要的是在OpenCV中使用KNearest或SVM功能实现一个简单的OCR。下面是我做的和如何。 (它只是为了学习如何使用KNearest简单的OCR目的)。



1)我的第一个问题是关于letter_recognition.data文件自带OpenCV示例。我想知道该文件内部是什么。



它包含一个字母,以及该字母的16个功能。



这个SOF 帮助我找到它。这些16个功能在 使用荷兰式自适应分类器的字母识别
(虽然我最后不明白一些功能)



2)由于我知道,功能,很难做到这个方法。我尝试了一些其他的文件,但都对一个初学者有点困难。



所以我决定把所有的像素值作为我的特点。(我不担心准确性或性能,我只是希望它工作,至少以最低的准确性)



我的培训数据的图片:





(我知道训练数据的数量较少,但是,由于所有字母都是相同的字体和大小,我决定尝试这个)。



为了准备训练数据,我在OpenCV中创建了一个小代码。它执行以下操作:



a)加载图片。





c)绘制一个字母周围的边界矩形,然后等待手动按键



d)一旦按下相应的数字键,它会调整此框的大小到10x10,并将数组中的100个像素值(这里是样本)和相应的手动输入的数字保存在另一个数组中(这里是响应)。



数组在单独的txt文件中。



手动分类数字结束时,列车数据(train.png)中的所有数字都由我们自己手动标记,如下所示:



/ p>

下面是我用于上述目的的代码(当然不是那么干净):

  import sys 

import numpy as np
import cv2

im = cv2.imread('pitrain.png')
im3 = im.copy()

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

#################现在找到Contours ### ################

contoururs,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

samples = np.empty((0,100))
responses = []
keys = [i for range in(48,58)]

for cnt in contour:
if cv2.contourArea(cnt)> 50:
[x,y,w,h] = cv2.boundingRect(cnt)

如果h> 28:
cv2.rectangle(im,(x,y),(x + w,y + h),(0,0,255),2)
roi = thresh [y:y + h,x:x + w ]
roismall = cv2.resize(roi,(10,10))
cv2.imshow('norm',im)
key = cv2.waitKey(0)

if key == 27:#(escape to quit)
sys.exit()
elif key in:
responses.append(int(chr(key)))
sample = roismall.reshape((1,100))
samples = np.append(samples,sample,0)

responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
printtraining complete

np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)






< 现在我们进入训练和测试部分。



对于测试使用下面的图像,我使用了相同类型的字母



p>

对于训练我们按以下方式操作



a)加载我们已经保存时间较早



b)创建我们正在使用的分类器实例(这里是KNearest)



c )然后我们使用KNearest.train函数训练数据



为了测试的目的,我们执行如下操作: b

a)加载用于测试的图像



b)如前所述处理图像,并使用轮廓方法提取每个数字



c)绘制边框,然后将其调整为10x10,并将其像素值存储在数组中,如前所述。



d)然后我们使用KNearest.find_nearest()函数找到我们给出的最近的项。 (如果幸运,它会识别正确的数字。)



我在下面的单个代码中包括最后两个步骤(训练和测试):

  import cv2 
import numpy as np

####### training part ######## #######
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))

model = cv2.KNearest()
model.train(samples,responses)

#############################测试部分################### ######

im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)

轮廓,等级= cv2.findContours(thresh, cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

用于轮廓中的cnt:
如果cv2.contourArea(cnt)> 50:
[x,y,w,h] cv2.boundingRect(cnt)
if h> 28:
cv2.rectangle(im,(x,y),(x + w,y + h),(0,255,0),2)
roi = thresh [y:y + h,x:x + w]
roismall = cv2.resize(roi,(10,10))
roismall = roismall.reshape )
roismall = np.float32(roismall)
retval,results,neigh_resp,dists = model.find_nearest(roismall,k = 1)
string = str(int [0])))
cv2.putText(out,string,(x,y + h),0,1,(0,255,0))

cv2.imshow ',im)
cv2.imshow('out',out)
cv2.waitKey(0)

它工作,下面是我得到的结果:










但是,无论如何,这是一个非常精确的数字,一个好的开始去为初学者(我希望如此)。


I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV.

I have 100 samples (i.e. images) of each digit. I would like to train with them.

There is a sample letter_recog.py that comes with OpenCV sample. But i still couldn't figure out on how to use it. I don't understand what are the samples, responses etc. Also, it loads a txt file at first, which i didn't understand first.

Later on searching a little bit, i could find a letter_recognition.data in cpp samples. I used it and made a code for cv2.KNearest in the model of letter_recog.py (just for testing):

import numpy as np
import cv2

fn = 'letter-recognition.data'
a = np.loadtxt(fn, np.float32, delimiter=',', converters={ 0 : lambda ch : ord(ch)-ord('A') })
samples, responses = a[:,1:], a[:,0]

model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()

It gave me an array of size 20000, i don't understand what it is.

Questions:

1) What is letter_recognition.data file ? How to build that file from my own data set?

2) What does results.reval() denote?

3) How we can write a simple digit recognition tool using letter_recognition.data file (either KNearest or SVM)?

解决方案

Well, I decided to workout myself on my question to solve above problem. What i wanted is to implement a simpl OCR using KNearest or SVM features in OpenCV. And below is what i did and how. ( it is just for learning how to use KNearest for simple OCR purposes).

1) My first question was about letter_recognition.data file that comes with OpenCV samples. I wanted to know what is inside that file.

It contains a letter, along with 16 features of that letter.

And this SOF helped me to find it. These 16 features are explained in the paperLetter Recognition Using Holland-Style Adaptive Classifiers. ( Although i didn't understand some of the features at end)

2) Since i knew, without understanding all those features, it is difficult to do that method. i tried some other papers, but all were a little difficult for a beginner.

So I just decided to take all the pixel values as my features. (I was not worried about accuracy or performance, i just wanted it to work, at least with the least accuracy)

I took below image for my training data:

( I know the amount of training data is less. But, since all letters are of same font and size, i decided to try on this).

To prepare the data for training, i made a small code in OpenCV. It does following things:

a) It loads the image.

b) Selects the digits ( obviously by contour finding and applying constraints on area and height of letters to avoid false detections).

c) Draws the bounding rectangle around one letter and wait for key press manually. This time we press the digit key ourselves corresponding to the letter in box.

d) Once corresponding digit key is pressed, it resizes this box to 10x10 and saves 100 pixel values in an array (here, samples) and corresponding manually entered digit in another array(here, responses).

e) Then save both the arrays in separate txt files.

At the end of manual classification of digits, all the digits in the train data( train.png) are labeled manually by ourselves, image will look like below:

Below is the code i used for above purpose ( of course, not so clean):

import sys

import numpy as np
import cv2

im = cv2.imread('pitrain.png')
im3 = im.copy()

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

#################      Now finding Contours         ###################

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

samples =  np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)

        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            cv2.imshow('norm',im)
            key = cv2.waitKey(0)

            if key == 27:  # (escape to quit)
                sys.exit()
            elif key in keys:
                responses.append(int(chr(key)))
                sample = roismall.reshape((1,100))
                samples = np.append(samples,sample,0)

responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"

np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)


Now we enter in to training and testing part.

For testing part i used below image, which has same type of letters i used to train.

For training we do as follows:

a) Load the txt files we already saved earlier

b) create a instance of classifier we are using ( here, it is KNearest)

c) Then we use KNearest.train function to train the data

For testing purposes, we do as follows:

a) We load the image used for testing

b) process the image as earlier and extract each digit using contour methods

c) Draw bounding box for it, then resize to 10x10, and store its pixel values in an array as done earlier.

d) Then we use KNearest.find_nearest() function to find the nearest item to the one we gave. ( If lucky, it recognises the correct digit.)

I included last two steps ( training and testing) in single code below:

import cv2
import numpy as np

#######   training part    ############### 
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))

model = cv2.KNearest()
model.train(samples,responses)

############################# testing part  #########################

im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            roismall = roismall.reshape((1,100))
            roismall = np.float32(roismall)
            retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
            string = str(int((results[0][0])))
            cv2.putText(out,string,(x,y+h),0,1,(0,255,0))

cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)

And it worked , below is the result i got:


Here it worked with 100% accuracy, for which the reason, i assume, is all digits are of same kind and same size.

But any way, this is a good start to go for beginners ( i hope so).

这篇关于OpenCV-Python中的简单数字识别OCR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆