我可以在Windows命令行中测试tesseract ocr吗? [英] Can I test tesseract ocr in windows command line?

查看:136
本文介绍了我可以在Windows命令行中测试tesseract ocr吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是tesseract OCR的新手.我试图将图像转换为tif并运行它,以在Windows中使用cmd查看tesseract的输出,但是我做不到.你能帮助我吗?将使用什么命令?

I am new to tesseract OCR. I tried to convert an image to tif and run it to see what the output from tesseract using cmd in windows, but I couldn't. Can you help me? What will be command to use?

这是我的示例图片:

推荐答案

最简单的tesseract.exe语法是tesseract.exe inputimage output-text-file. 这里的假设是将tesseract.exe添加到PATH环境变量中. 如果您的文本参数特别难以识别,则可以添加-psm N参数.

The simplest tesseract.exe syntax is tesseract.exe inputimage output-text-file. The assumption here, is that tesseract.exe is added to the PATH environment variable. You can add the -psm N argument if your text argument is particularly hard to recognize.

我看到常规语法(不带任何-psm开关)可以很好地处理您附加的图像,除非准确度不够好.

I see that the regular syntax (without any -psm switches) works fine enough with the image you attached, unless the level of accuracy is not good enough.

请注意,不会识别非英语字符(例如处方旁边的符号);我的默认安装只包含英语培训数据.

Note that non-english characters (such as the symbol next to prescription) are not recognized; my default installation only contains the English training data.

这是tesseract语法说明:

Here's the tesseract syntax description:

C:\Users\vish\Desktop>tesseract.exe
Usage:tesseract.exe imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
-l lang and/or -psm pagesegmode must occur before anyconfigfile.

Single options:
  -v --version: version info
  --list-langs: list available languages for tesseract engine

这是图像的输出(注意:下载后,它会转换为PNG图像):

And here's the output for your image (NOTE: When I downloaded it, it converted to a PNG image):

C:\Users\vish\Desktop>tesseract.exe ECL8R.png out.txt
Tesseract Open Source OCR Engine v3.02 with Leptonica

C:\Users\vish\Desktop>type out.txt.txt
1 Project Background

A prescription (R) is a written order by a physician or medical doctor to a pharmacist in the form of
medication instructions for an individual patient. You can't get prescription medicines unless someone
with authority prescribes them. Usually, this means a written prescription from your doctor. Dentists,

optometrists, midwives and nurse practitioners may also be authorized to prescribe medicines for you.

It can also be defined as an order to take certain medications.

A prescription has legal implications; this means the prescriber must assume his responsibility for the
clinical care ofthe patient.

Recently, the term "prescriptionΓÇ¥ has known a wider usage being used for clinical assessments,

这篇关于我可以在Windows命令行中测试tesseract ocr吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆