在Android的出口HOCR输出的tesseract OCR [英] Export HOCR output for tesseract OCR in android

查看:1158
本文介绍了在Android的出口HOCR输出的tesseract OCR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用苔丝两,的tesseract工具为Android的一个分支。我想打开 hocr 输出的tesseract,从这个的链接,我试图设置变量 tessedit_create_hocr 为真,但我看不到hocr输出。这里是我的尝试:

  baseApi.init(FileUtil.getAppFolder(),ENG,TessBaseAPI.OEM_TESSERACT_CUBE_COMBINED);
  baseApi.setVariable(tessedit_create_hocr,1)
  baseApi.setImage(位);
  串recognizedText = baseApi.getUTF8Text();
 

有人告诉 hocr 输出应该在config文件夹或文件夹包含的图像,但我看不到任何东西。任何我不知道如何配置文件名和hocr输出的位置。照片

另一件事:有什么办法申请配置文件到的tesseract工具为Android?我把配置文件放到 tessdata / config文件夹,但没有什么发生。如何分辨的tesseract 应阅读这些配置文件?似乎他们没有足够的文件,为Android。

更新:由于 @nguyenq ,我可以让 HOCR 数据。这里是我的尝试:

 的jstring Java_com_google code_tesseract_android_TessBaseAPI_nativeGetHOCRText(JNIEnv的* ENV,
                                                                        jobject THIZ,jint页){

 native_data_t * NAT = get_native_data(ENV,THIZ);

 字符*文本= NAT-> api.GetHOCRText(页);

 的jstring结果= ​​ENV-> NewStringUTF(文本);

 免费(文本);

 返回结果;
 }
 

解决方案

显然,苔丝个没有实现所有的 TessBaseAPI ,因为它不包括本机 GetHOCRText 法的支持。您可能需要自己扩展的包装来访问您所需要的功能。

该配置文件旨在为命令行执行。另外,您也可以通过公开的API方法的setVariable 设置必需的变量。

I tried to use tess-two, a fork of Tesseract Tools for Android. I want to turn on hocr output in tesseract, from this link, I tried to set variable tessedit_create_hocr as true, but I can't see hocr in output. Here is my try:

  baseApi.init(FileUtil.getAppFolder(), "eng", TessBaseAPI.OEM_TESSERACT_CUBE_COMBINED);
  baseApi.setVariable("tessedit_create_hocr", "1")
  baseApi.setImage(bitmap);
  String recognizedText = baseApi.getUTF8Text();

Somebody told the hocr output should be in config folder or in folder contain image, but I don't see anything. Any I don't know how to config the file name and location of hocr output.

Another thing: is there any way to apply config file into Tesseract Tools for Android? I put the config files into tessdata/config folder, but there is nothing happen. How to tell tesseract should read these config files? Seem they don't have enough documents for android.

Update: Thanks to @nguyenq, now I can get HOCR data. Here is my try:

  jstring Java_com_googlecode_tesseract_android_TessBaseAPI_nativeGetHOCRText(JNIEnv *env,
                                                                        jobject thiz,    jint page) {

 native_data_t *nat = get_native_data(env, thiz);

 char *text = nat->api.GetHOCRText(page);

 jstring result = env->NewStringUTF(text);

 free(text);

 return result;
 }

解决方案

Apparently, tess-two does not implement all the TessBaseAPI as it does not include support for the native GetHOCRText method. You may have to extend the wrapper yourself to access the functions you need.

The config files are meant for command-line execution. Alternatively, you can set the necessary variables through the exposed API method setVariable.

这篇关于在Android的出口HOCR输出的tesseract OCR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆