如何使用Keras OCR示例? [英] How can I use the Keras OCR example?
问题描述
我发现了 examples/image_ocr.py
,它似乎适用于OCR .因此,应该有可能给模型一个图像并接收文本.但是,我不知道该怎么做.如何为模型提供新图像?哪种预处理是必需的?
I found examples/image_ocr.py
which seems to for OCR. Hence it should be possible to give the model an image and receive text. However, I have no idea how to do so. How do I feed the model with a new image? Which kind of preprocessing is necessary?
安装附属设备:
- 安装
cairocffi
:sudo apt-get install python-cairocffi
- 安装
editdistance
:sudo -H pip install editdistance
- 更改
train
以返回模型并保存经过训练的模型. - 运行脚本来训练模型.
- Install
cairocffi
:sudo apt-get install python-cairocffi
- Install
editdistance
:sudo -H pip install editdistance
- Change
train
to return the model and save the trained model. - Run the script to train the model.
现在我有一个model.h5
.接下来是什么?
Now I have a model.h5
. What's next?
请参见 https://github.com/MartinThoma/algorithms/tree /master/ML/ocr/keras 作为我当前的代码.我知道如何加载模型(见下文),这似乎可行.问题是我不知道如何将带有文本的新图像扫描输入模型.
See https://github.com/MartinThoma/algorithms/tree/master/ML/ocr/keras for my current code. I know how to load the model (see below) and this seems to work. The problem is that I don't know how to feed new scans of images with text to the model.
- 什么是CTC? 连接主义的时间分类?
- 有没有可以可靠检测文件旋转的算法?
- 是否存在可以可靠地检测行/文本块/表格/图像的算法(因此可以进行合理的分割)?我想用平滑和逐行直方图进行边缘检测已经可以很好地做到这一点了吗?
#!/usr/bin/env python
from keras import backend as K
import keras
from keras.models import load_model
import os
from image_ocr import ctc_lambda_func, create_model, TextImageGenerator
from keras.layers import Lambda
from keras.utils.data_utils import get_file
import scipy.ndimage
import numpy
img_h = 64
img_w = 512
pool_size = 2
words_per_epoch = 16000
val_split = 0.2
val_words = int(words_per_epoch * (val_split))
if K.image_data_format() == 'channels_first':
input_shape = (1, img_w, img_h)
else:
input_shape = (img_w, img_h, 1)
fdir = os.path.dirname(get_file('wordlists.tgz',
origin='http://www.mythic-ai.com/datasets/wordlists.tgz', untar=True))
img_gen = TextImageGenerator(monogram_file=os.path.join(fdir, 'wordlist_mono_clean.txt'),
bigram_file=os.path.join(fdir, 'wordlist_bi_clean.txt'),
minibatch_size=32,
img_w=img_w,
img_h=img_h,
downsample_factor=(pool_size ** 2),
val_split=words_per_epoch - val_words
)
print("Input shape: {}".format(input_shape))
model, _, _ = create_model(input_shape, img_gen, pool_size, img_w, img_h)
model.load_weights("my_model.h5")
x = scipy.ndimage.imread('example.png', mode='L').transpose()
x = x.reshape(x.shape + (1,))
# Does not work
print(model.predict(x))
这给
2017-07-05 22:07:58.695665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:996] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:01:00.0)
Traceback (most recent call last):
File "eval_example.py", line 45, in <module>
print(model.predict(x))
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1567, in predict
check_batch_axis=False)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 106, in _standardize_input_data
'Found: array with shape ' + str(data.shape))
ValueError: The model expects 4 arrays, but only received one array. Found: array with shape (512, 64, 1)
推荐答案
好吧,我会尽力回答您在此处提出的所有问题:
Well, I will try to answer everything you asked here:
正如OCR代码中所述,Keras不支持具有多个参数的损耗,因此它在Lambda层中计算了NN损耗.在这种情况下是什么意思?
As commented in the OCR code, Keras doesn't support losses with multiple parameters, so it calculated the NN loss in a lambda layer. What does this mean in this case?
由于使用4个输入([input_data, labels, input_length, label_length]
)和loss_out
作为输出,因此神经网络可能会造成混淆.除了input_data之外,其他所有信息仅用于计算损失,这意味着仅用于培训.我们希望在原始代码的468行中找到类似的内容:
The neural network may look confusing because it is using 4 inputs ([input_data, labels, input_length, label_length]
) and loss_out
as output. Besides input_data, everything else is information used only for calculating the loss, it means it is only used for training. We desire something like in line 468 of the original code:
Model(inputs=input_data, outputs=y_pred).summary()
表示我有图像作为输入,请告诉我这里写的是什么".那么如何实现呢?
which means "I have an image as input, please tell me what is written here". So how to achieve it?
1)保持原始训练代码不变,正常进行训练;
1) Keep the original training code as it is, do the training normally;
2)训练后,将此模型Model(inputs=input_data, outputs=y_pred)
保存到.h5文件中,以便随时随地加载;
2) After training, save this model Model(inputs=input_data, outputs=y_pred)
in a .h5 file to be loaded wherever you want;
3)进行预测:如果您看一下代码,则输入的图像将被反转并翻译,因此您可以使用此代码简化操作:
3) Do the prediction: if you take a look at the code, the input image is inverted and translated, so you can use this code to make it easy:
from scipy.misc import imread, imresize
#use width and height from your neural network here.
def load_for_nn(img_file):
image = imread(img_file, flatten=True)
image = imresize(image,(height, width))
image = image.T
images = np.ones((1,width,height)) #change 1 to any number of images you want to predict, here I just want to predict one
images[0] = image
images = images[:,:,:,np.newaxis]
images /= 255
return images
加载图像后,进行预测:
With the image loaded, let's do the prediction:
def predict_image(image_path): #insert the path of your image
image = load_for_nn(image_path) #load from the snippet code
raw_word = model.predict(image) #do the prediction with the neural network
final_word = decode_output(raw_word)[0] #the output of our neural network is only numbers. Use decode_output from image_ocr.py to get the desirable string.
return final_word
这应该足够了.根据我的经验,训练中使用的图像不足以做出正确的预测,我将使用其他数据集发布代码,以在需要时改善我的结果.
This should be enough. From my experience, the images used in the training are not good enough to make good predictions, I will release a code using other datasets that improved my results later if necessary.
回答相关问题:
- 什么是CTC? 连接主义的时间分类?
这是一种用于改善序列分类的技术.原始论文证明,在发现音频中所说的内容时,它可以改善结果.在这种情况下,它是一个字符序列.解释有些技巧,但是您可以在此处找到一个很好的解释.
It is a technique used to improve sequence classification. The original paper proves it improves results on discovering what is said in audio. In this case it is a sequence of characters. The explanation is a bit trick but you can find a good one here.
- 有没有可以可靠检测文件旋转的算法?
我不确定,但是您可以看一下神经网络中的注意力机制.我现在没有任何好的链接,但我知道可能是这种情况.
I am not sure but you could take a look at Attention mechanism in neural networks. I don't have any good link now but I know it could be the case.
- 是否存在可以可靠地检测行/文本块/表格/图像的算法(因此可以进行合理的分割)?我想用平滑和逐行直方图进行边缘检测已经可以很好地做到这一点了吗?
OpenCV实现最大稳定的极值区域(称为MSER).我真的很喜欢这种算法的结果,它的速度很快,并且在需要时对我来说足够好.
OpenCV implements Maximally Stable Extremal Regions (known as MSER). I really like the results of this algorithm, it is fast and was good enough for me when I needed.
正如我之前所说,我将尽快发布代码.这样做时,我将使用存储库来编辑问题,但是我相信这里的信息足以使示例运行.
As I said before, I will release a code soon. I will edit the question with the repository when I do, but I believe the information here is enough to get the example running.
这篇关于如何使用Keras OCR示例?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!