OpenCV-Python 中的简单数字识别 OCR [英] Simple Digit Recognition OCR in OpenCV-Python

查看:40
本文介绍了OpenCV-Python 中的简单数字识别 OCR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 OpenCV-Python (cv2) 中实现数字识别 OCR".它仅用于学习目的.我想学习 OpenCV 中的 KNearest 和 SVM 功能.

我有每个数字的 100 个样本(即图像).我想和他们一起训练.

OpenCV 示例附带了一个示例 letter_recog.py.但我仍然不知道如何使用它.我不明白样本、响应等是什么.另外,它首先加载一个 txt 文件,我首先不明白.

后来稍微搜索了一下,我在 cpp 样本中找到了 letter_recognition.data.我用了,在letter_recog.py的模型中做了一个cv2.KNearest的代码(仅供测试):

将 numpy 导入为 np导入 cv2fn = 'letter-recognition.data'a = np.loadtxt(fn, np.float32, delimiter=',', converters={ 0 : lambda ch : ord(ch)-ord('A') })样本,响应 = a[:,1:], a[:,0]模型 = cv2.KNearest()retval = model.train(样本,响应)retval,结果,neigh_resp,dists = model.find_nearest(样本,k = 10)打印 results.ravel()

它给了我一个大小为 20000 的数组,我不明白它是什么.

问题:

1) 什么是 letter_recognition.data 文件?如何从我自己的数据集构建该文件?

2) results.reval() 表示什么?

3) 我们如何使用 letter_recognition.data 文件(KNearest 或 SVM)编写一个简单的数字识别工具?

解决方案

好吧,我决定在我的问题上锻炼自己以解决上述问题.我想要的是在 OpenCV 中使用 KNearest 或 SVM 功能实现一个简单的 OCR.以下是我所做的以及如何做的.(仅用于学习如何将 KNearest 用于简单的 OCR 目的).

1) 我的第一个问题是关于 OpenCV 样本附带的 letter_recognition.data 文件.我想知道那个文件里面有什么.

它包含一个字母,以及该字母的 16 个特征.

this SOF 帮我找到了它.这16个特征在论文Letter 中有解释使用荷兰式自适应分类器进行识别.(虽然最后有些功能没看懂)

2) 因为我知道,不了解所有这些功能,很难做到这种方法.我尝试了一些其他的论文,但对于初学者来说都有些困难.

所以我决定把所有的像素值作为我的特征.(我不担心准确性或性能,我只是想让它工作,至少以最低的准确性)>

我为我的训练数据拍摄了以下图像:

(我知道训练数据量较少.但是,由于所有字母的字体和大小都相同,我决定尝试一下).

为了准备训练数据,我在OpenCV中做了一个小代码.它执行以下操作:

  1. 它加载图像.
  2. 选择数字(显然是通过轮廓查找和对字母的面积和高度施加约束以避免错误检测).
  3. 围绕一个字母绘制边界矩形并等待手动按键.这次我们自己按数字键对应框中的字母.
  4. 按下相应的数字键后,它会将这个框的大小调整为 10x10,并将 100 个像素值保存在一个数组中(此处为样本),并将相应的手动输入数字保存在另一个数组中(此处为响应).
  5. 然后将两个数组保存在单独的 txt 文件中.

在数字的手动分类结束时,训练数据(train.png)中的所有数字都由我们自己手动标记,图像如下所示:

以下是我用于上述目的的代码(当然,不是那么干净):

导入系统将 numpy 导入为 np导入 cv2im = cv2.imread('pitrain.png')im3 = im.copy()灰色 = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)模糊 = cv2.GaussianBlur(gray,(5,5),0)阈值 = cv2.adaptiveThreshold(blur,255,1,1,11,2)################# 现在找到轮廓###################轮廓,层次结构 = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)样本 = np.empty((0,100))回复 = []键 = [i for i in range(48,58)]对于轮廓中的 cnt:如果 cv2.contourArea(cnt)>50:[x,y,w,h] = cv2.boundingRect(cnt)如果h>28:cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)roi = thresh[y:y+h,x:x+w]roismall = cv2.resize(roi,(10,10))cv2.imshow('norm',im)键 = cv2.waitKey(0)if key == 27: #(退出退出)系统退出()elif 键入密钥:response.append(int(chr(key)))样本 = roismall.reshape((1,100))样本 = np.append(samples,sample,0)响应 = np.array(响应,np.float32)响应 = response.reshape((responses.size,1))打印训练完成"np.savetxt('generalsamples.data',samples)np.savetxt('generalresponses.data',responses)

<小时>

现在我们进入训练和测试部分.

为了测试我使用下图的部分,它具有我用来训练的相同类型的字母.

对于训练,我们执行以下操作:

  1. 加载我们之前保存的txt文件
  2. 创建一个我们正在使用的分类器实例(这里是 KNearest)
  3. 然后我们使用 KNearest.train 函数来训练数据

出于测试目的,我们执行以下操作:

  1. 我们加载用于测试的图像
  2. 像之前一样处理图像并使用轮廓方法提取每个数字
  3. 为其绘制边界框,然后将其大小调整为 10x10,并将其像素值存储在一个数组中,如前所述.
  4. 然后我们使用 KNearest.find_nearest() 函数来找到与我们给出的最接近的项目.(如果幸运的话,它会识别正确的数字.)

我在下面的单个代码中包含了最后两个步骤(训练和测试):

导入 cv2将 numpy 导入为 np#######培训部分###############样本 = np.loadtxt('generalsamples.data',np.float32)响应 = np.loadtxt('generalresponses.data',np.float32)响应 = response.reshape((responses.size,1))模型 = cv2.KNearest()model.train(样本,响应)############################ 测试部分########################im = cv2.imread('pi.png')out = np.zeros(im.shape,np.uint8)灰色 = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)阈值 = cv2.adaptiveThreshold(gray,255,1,1,11,2)轮廓,层次结构 = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)对于轮廓中的 cnt:如果 cv2.contourArea(cnt)>50:[x,y,w,h] = cv2.boundingRect(cnt)如果h>28:cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)roi = thresh[y:y+h,x:x+w]roismall = cv2.resize(roi,(10,10))roismall = roismall.reshape((1,100))roismall = np.float32(roismall)retval,结果,neigh_resp,dists = model.find_nearest(roismall,k = 1)字符串 = str(int((结果[0][0])))cv2.putText(out,string,(x,y+h),0,1,(0,255,0))cv2.imshow('im',im)cv2.imshow('out',out)cv2.waitKey(0)

它奏效了,下面是我得到的结果:

<小时>

在这里它以 100% 的准确率工作.我认为这是因为所有数字的种类和大小都相同.

但无论如何,这对初学者来说是一个好的开始(我希望如此).

I am trying to implement a "Digit Recognition OCR" in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV.

I have 100 samples (i.e. images) of each digit. I would like to train with them.

There is a sample letter_recog.py that comes with OpenCV sample. But I still couldn't figure out on how to use it. I don't understand what are the samples, responses etc. Also, it loads a txt file at first, which I didn't understand first.

Later on searching a little bit, I could find a letter_recognition.data in cpp samples. I used it and made a code for cv2.KNearest in the model of letter_recog.py (just for testing):

import numpy as np
import cv2

fn = 'letter-recognition.data'
a = np.loadtxt(fn, np.float32, delimiter=',', converters={ 0 : lambda ch : ord(ch)-ord('A') })
samples, responses = a[:,1:], a[:,0]

model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()

It gave me an array of size 20000, I don't understand what it is.

Questions:

1) What is letter_recognition.data file? How to build that file from my own data set?

2) What does results.reval() denote?

3) How we can write a simple digit recognition tool using letter_recognition.data file (either KNearest or SVM)?

解决方案

Well, I decided to workout myself on my question to solve above problem. What I wanted is to implement a simpl OCR using KNearest or SVM features in OpenCV. And below is what I did and how. ( it is just for learning how to use KNearest for simple OCR purposes).

1) My first question was about letter_recognition.data file that comes with OpenCV samples. I wanted to know what is inside that file.

It contains a letter, along with 16 features of that letter.

And this SOF helped me to find it. These 16 features are explained in the paperLetter Recognition Using Holland-Style Adaptive Classifiers. ( Although I didn't understand some of the features at end)

2) Since I knew, without understanding all those features, it is difficult to do that method. I tried some other papers, but all were a little difficult for a beginner.

So I just decided to take all the pixel values as my features. (I was not worried about accuracy or performance, I just wanted it to work, at least with the least accuracy)

I took below image for my training data:

( I know the amount of training data is less. But, since all letters are of same font and size, I decided to try on this).

To prepare the data for training, I made a small code in OpenCV. It does following things:

  1. It loads the image.
  2. Selects the digits ( obviously by contour finding and applying constraints on area and height of letters to avoid false detections).
  3. Draws the bounding rectangle around one letter and wait for key press manually. This time we press the digit key ourselves corresponding to the letter in box.
  4. Once corresponding digit key is pressed, it resizes this box to 10x10 and saves 100 pixel values in an array (here, samples) and corresponding manually entered digit in another array(here, responses).
  5. Then save both the arrays in separate txt files.

At the end of manual classification of digits, all the digits in the train data( train.png) are labeled manually by ourselves, image will look like below:

Below is the code I used for above purpose ( of course, not so clean):

import sys

import numpy as np
import cv2

im = cv2.imread('pitrain.png')
im3 = im.copy()

gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)

#################      Now finding Contours         ###################

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

samples =  np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)

        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            cv2.imshow('norm',im)
            key = cv2.waitKey(0)

            if key == 27:  # (escape to quit)
                sys.exit()
            elif key in keys:
                responses.append(int(chr(key)))
                sample = roismall.reshape((1,100))
                samples = np.append(samples,sample,0)

responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"

np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)


Now we enter in to training and testing part.

For testing part I used below image, which has same type of letters I used to train.

For training we do as follows:

  1. Load the txt files we already saved earlier
  2. create a instance of classifier we are using ( here, it is KNearest)
  3. Then we use KNearest.train function to train the data

For testing purposes, we do as follows:

  1. We load the image used for testing
  2. process the image as earlier and extract each digit using contour methods
  3. Draw bounding box for it, then resize to 10x10, and store its pixel values in an array as done earlier.
  4. Then we use KNearest.find_nearest() function to find the nearest item to the one we gave. ( If lucky, it recognises the correct digit.)

I included last two steps ( training and testing) in single code below:

import cv2
import numpy as np

#######   training part    ############### 
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))

model = cv2.KNearest()
model.train(samples,responses)

############################# testing part  #########################

im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)

contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)

for cnt in contours:
    if cv2.contourArea(cnt)>50:
        [x,y,w,h] = cv2.boundingRect(cnt)
        if  h>28:
            cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
            roi = thresh[y:y+h,x:x+w]
            roismall = cv2.resize(roi,(10,10))
            roismall = roismall.reshape((1,100))
            roismall = np.float32(roismall)
            retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
            string = str(int((results[0][0])))
            cv2.putText(out,string,(x,y+h),0,1,(0,255,0))

cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)

And it worked, below is the result I got:


Here it worked with 100% accuracy. I assume this is because all the digits are of same kind and same size.

But any way, this is a good start to go for beginners ( I hope so).

这篇关于OpenCV-Python 中的简单数字识别 OCR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆