PDF特殊搜索iOS [英] PDF special searching iOS

查看:125
本文介绍了PDF特殊搜索iOS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道在iOS上有大量可用于PDF搜索的资源,它是 PDFKitten

但是我的情况是我遇到了一些该源无法搜索的PDF文件.我试图通过Mac上的预览"应用程序打开这些文件,然后尝试搜索,它可以正常工作.

我在在此处上传了一个文件.

您可以通过预览"应用打开该文件进行检查,然后搜索"ra"一词.完美的作品.如果您将该文件拖到源PDFKitten上并进行一些配置,以使源打开它,然后尝试搜索,该文件将无效.

我检查了源代码,它关心显示操作符的所有文本,包括Tj,',',TJ.我在这些操作员的回叫中放置了一些日志行,但发现这些回叫没有被调用.

你能给我一些建议或想法吗?

解决方案

如果我正确理解代码,PDFKitten仅在页面/Resources词典的/Font条目中查找字体.至少那是我对方法扫描仪的fontCollectionWithPage 结果pdfScannerCallbacks中的 setFont 会查询其中的一个以设置当前字体对象.

此外,没有用于Do运算符的回调(即,用于将XObject资源的内容注入到页面内容中的运算符).除非CGPDFScannerScan在后台解释此运算符,否则根本不会扫描包含的XObjects的内容.这与您的观察结果相符,即文本设置运算符回调永远不会被调用.

但是,您的文件mundo1.pdf在其页面的/Resources词典中没有任何直接的/Font条目.而是将每个页面的所有实际内容分别包装到单个/XObject资源中.这些XObject依次具有自己的/Resources字典,其中包含/Font条目,该条目定义用于相应页面的字体.

因此,PDFKitten对文件中使用的字体一无所知,尤其是其编码,因此无法从PDF内容中提取文本.也许甚至看不到要解释的PDF内容.

因此,我建议您在PDFKitten问题管理​​网站上发布此问题.

顺便说一句,此PDF构造完全符合PDF规范.但是,它似乎没有充分利用iText库.像这样使用iText的软件的作者应该检查他的代码,并开始使用iText库的更合适的类.

I know that there's a great source that works on iOS for PDF searching, it's PDFKitten

But my case is that I encounter some PDF files that this source don't work for search. I tried to open these file by 'Preview' app on Mac and tried to search, it works.

I uploaded one file here.

You can check by open this file by 'Preview' app and search the word 'ra'. It works perfect. By if you drag this file to the source PDFKitten and make some configurations so that the source open it, then try to search, it don't work.

I inspected the source, it cares all the text showing operator, including Tj, ', '', TJ. I placed some log lines in these operator's call backs and I saw these call backs are not called.

Can you give my some suggestions or any ideas?

解决方案

If I understand the code correctly, PDFKitten looks for fonts only in the /Font entry of the /Resources dictionary of the page. At least that's my interpretation of the method fontCollectionWithPage of Scanner the result of which is queried by setFont in pdfScannerCallbacks to set the current font object.

Furthermore there is no callback for the Do operator (i.e. the operator used to inject the contents of a XObject resource into the page content). Unless CGPDFScannerScan interprets this operator under the hood, the content of included XObjects is not scanned at all. This would match your observation that the text setting operator callbacks never get called.

Your file mundo1.pdf, though, does not have any immediate /Font entry in the /Resources dictionaries of its pages. Instead all the actual content of each page is wrapped into a single /XObject resources respectively. These XObjects in turn have their own /Resources dictionaries which contain a /Font entry defining the fonts used for the respective page.

Thus, PDFKitten does not know anything about the fonts used in your file, especially about their encodings, and so cannot extract the text from the PDF contents. Maybe it does not even get to see the PDF contents to interpret.

I would, therefore, propose you post this issue on the PDFKitten issue management site.

By the way, this PDF construct is completely according to the PDF spec. Nonetheless it looks like a non-adequate use of the iText library. The author of the software using iText like that should review his code and start using better suited classes of the iText library.

这篇关于PDF特殊搜索iOS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆