机器人:从提取图像文本 [英] android: Extract Text from Image

查看:130
本文介绍了机器人:从提取图像文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的工作这就需要JPEG图像的文本,这样我可以识别图像中写有文字转换的应用程序。 PLZ给我一个指导做好做到这一点。

I am working on an application Which need to convert the jpeg image to text so that I can identify the text written there in the image. plz give me a guidence to do that .

推荐答案

摘自的使用的tesseract使OCR应用程序。

EXTRACT FROM Making OCR app using Tesseract.

注意:这些说明适用于Android的SDK R19和Android NDK R7C。在64位的Ubuntu,你可能需要安装IA32-库的32位兼容性库。您还需要正确的路径变量添加。

Note: These instructions are for Android SDK r19 and Android NDK r7c. On 64-bit Ubuntu, you may need to install the ia32-libs 32-bit compatibility library. You would also need proper PATH variables added.

下载源或复制此 Git仓库。该项目包含编译的tesseract,Leptonica和JPEG库在Android上使用的工具。它包含一个Eclipse的Andr​​oid库项目,它提供了一个Java API来访问本地编译的tesseract和Leptonica的API。你不需要眼睛两code,您没有它可以做的。

Download the source or clone this git repository. This project contains tools for compiling the Tesseract, Leptonica, and JPEG libraries for use on Android. It contains an Eclipse Android library project that provides a Java API for accessing natively-compiled Tesseract and Leptonica APIs. You don’t need eyes-two code, you can do without it.

构建使用这些命令的这个项目(这里,苔丝-2是内部苔丝-2的目录 - 所述一个在相同的水平的苔丝-2检验):

Build this project using these commands (here, tess-two is the directory inside tess-two – the one at the same level as of tess-two-test):

cd <project-directory>/tess-two
ndk-build
android update project --path .
ant release

现在导入项目作为Eclipse的一个库。

Now import the project as a library in Eclipse.

File -> Import -> Existing Projects into workspace -> tess-two directory<code>. Right click the project, Android Tools -> Fix Project Properties. Right click -> Properties -> Android -> Check Is Library

配置您的项目中使用的苔丝个项目作为库项目:

Configure your project to use the tess-two project as a library project:

Right click your project name -> Properties -> Android -> Library -> Add, and choose tess-two. 

您现在可以使用该库到OCR的任何图像。

You’re now ready to OCR any image using the library.

首先,我们需要得到的图片本身。对于这一点,我发现了一个简单的code在这里拍摄的图像。之后我们有位图,我们只需要执行OCR这是比较容易的。一定要通过做类似纠正旋转和图像类型:

First, we need to get the picture itself. For that, I found a simple code to capture the image here. After we have the bitmap, we just need to perform the OCR which is relatively easy. Be sure to correct the rotation and image type by doing something like:

// _path = path to the image to be OCRed
ExifInterface exif = new ExifInterface(_path);
int exifOrientation = exif.getAttributeInt(
        ExifInterface.TAG_ORIENTATION,
        ExifInterface.ORIENTATION_NORMAL);

int rotate = 0;

switch (exifOrientation) {
case ExifInterface.ORIENTATION_ROTATE_90:
    rotate = 90;
break;
case ExifInterface.ORIENTATION_ROTATE_180:
    rotate = 180;
break;
case ExifInterface.ORIENTATION_ROTATE_270:
    rotate = 270;
break;
}

if (rotate != 0) {
    int w = bitmap.getWidth();
    int h = bitmap.getHeight();

    // Setting pre rotate
    Matrix mtx = new Matrix();
    mtx.preRotate(rotate);

    // Rotating Bitmap & convert to ARGB_8888, required by tess
    bitmap = Bitmap.createBitmap(bitmap, 0, 0, w, h, mtx, false);
}
bitmap = bitmap.copy(Bitmap.Config.ARGB_8888, true);

现在我们已经在图像中的位图,我们可以简单地使用TessBaseAPI运行OCR喜欢的:

Now we have the image in the bitmap, and we can simply use the TessBaseAPI to run the OCR like:

TessBaseAPI baseApi = new TessBaseAPI();
// DATA_PATH = Path to the storage
// lang = for which the language data exists, usually "eng"
baseApi.init(DATA_PATH, lang);
// Eg. baseApi.init("/mnt/sdcard/tesseract/tessdata/eng.traineddata", "eng");
baseApi.setImage(bitmap);
String recognizedText = baseApi.getUTF8Text();
baseApi.end();
(You can download the language files from [here][2] and put them in a directory on your device – manually or by code)

现在你已经得到的变量recognizedText的文字进行OCR,你可以做pretty的任何东西它 - 翻译,搜索,什么! PS。您可以通过具有preference,然后从的此处。你甚至可以把他们的资产文件夹,并将其复制到启动的SD卡。

Now that you’ve got the OCRed text in the variable recognizedText, you can do pretty much anything with it – translate, search, anything! ps. You can add various language support by having a preference and then downloading the required language data file from here. You might even put them in the assets folder and copy them to the SD card on start.

故障排除

  • 关于更新路径 - 您需要更新的命令功能PATH变量,否则你将会看到not found错误的命令。对于Android的SDK中,SDK的工具和平台工具目录的位置添加到您的PATH环境变量。对于Android NDK,使用相同的过程中对Android的NDK目录添加到PATH变量。
  • Maven的伊辛 - 选中此<一个href="http://www.jameselsey.co.uk/blogs/techblog/tesseract-ocr-on-android-is-easier-if-you-maven-ise-it-works-on-windows-too/"相对=nofollow>帖子由詹姆斯·埃尔西。他还提到,他得到了它在Windows上工作没有任何问题。
  • 您也可以尝试在此页上按Ctrl + F,荷兰国际集团的问题,有人可能已经遇到了它,并张贴在评论的解决方案。
  • About updating PATH - You need to update your PATH variable for the commands to function, otherwise you would see a command not found error. For Android SDK, add the location of the SDK’s tools and platform-tools directories to your PATH environment variable. For Android NDK, use the same process to add the android-ndk directory to the PATH variable.
  • Maven-ising – Check this post by James Elsey. He also mentions that he got it working on Windows without any problems.
  • You may also try Ctrl+F-ing your problem on this page, someone might have already encountered it and posted a solution in the comments.

这篇关于机器人:从提取图像文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆