目标C提取PDF文本 [英] Extracting pdf text in Objective C

查看:153
本文介绍了目标C提取PDF文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

到现在为止,我还没有找到一个解决方案,将工作做好,从目标C在iPhone上使用PDF文件中提取文本。我发现了一些标准C code和修改它的工作,并认为我会在这里提供它,到现在为止我已经使用计算器相当多,但从来没有给过了。你可以在这里得到它:
https://github.com/zachron/pdfiphone

Up to this point, I had not found a solution that would work well to extract text from a pdf file in Objective C for use on the iPhone. I found some standard C code and modified it to work, and thought I would provide it here, as up to this point I have used stackoverflow quite a bit but never gave back. You can get it here: https://github.com/zachron/pdfiphone

它输入PDF文件的路径,并返回PDF文本的的NSString。我没有写的大部分,但我没有修改它,它将与iPhone合作,目的C.你需要在你的项目(在iPhone上libz.dylib)Zlib库,如果有人借此,使更真棒,这是美好的时光。

It takes as input the path of the pdf file and returns a nsstring of the text in the pdf. I did not write the majority of this, but I did modify it so it would work with the iPhone and Objective C. You do need to include the Zlib library in your project (libz.dylib on the iPhone) if someone takes this and makes it more awesome, that is good times.

推荐答案

请记住,这将只用于提取存储为这样的PDF文本。它不会OCR扫描的PDF文件。如果你想这样做,有一个使用的tesseract ,谷歌的强大和开放源码软件OCR的选择发动机。它编译在iPhone上:见诺兰布朗的Tesseract-iPhone-Demo 的工作示例。成像库 ImageMagic还会编译在iPhone ,它将使您将PDF转换为TIFF,它的tesseract接受作为输入。

Keep in mind that this will only work for extracting text that is stored as such in the PDF. It won't OCR scanned PDFs. If you want to do that, there is the option of using Tesseract, Google's robust and FOSS OCR engine. It compiles on the iPhone: see Nolan Brown's Tesseract-iPhone-Demo for a working example. The imaging library ImageMagic also compiles on the iPhone, and it will allow you to convert PDF to TIFF, which Tesseract accepts as input.

这篇关于目标C提取PDF文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆