如何从pdf文档中获取字符偏移信息? [英] How do I get character offset information from a pdf document?
问题描述
我正在尝试在网络应用程序中突出显示pdf的搜索结果.我有原始的pdf文件,以及在搜索结果中使用的小png版本.本质上,我正在寻找一个像这样的api:
I'm trying to implement search result highlighting for pdfs in a web app. I have the original pdfs, and small png versions that are used in search results. Essentially I'm looking for an api like:
pdf_document.find_offsets('somestring')
# => { top: 501, left: 100, bottom: 520, right: 150 }, { ... another box ... }, ...
我知道有可能从pdf中获取此信息,因为Apple的Preview.app实现了此信息.
I know it's possible to get this information out of a pdf because Apple's Preview.app implements this.
需要在Linux上运行的东西,理想情况下是开源的.我知道您可以在Windows上使用acrobat来做到这一点.
Need something that runs on Linux and ideally is open source. I'm aware you can do this with acrobat on windows.
推荐答案
尝试查看PdfLib TET http://www.pdflib.com/products/tet/
Try to look at PdfLib TET http://www.pdflib.com/products/tet/
(它不是免费的)
Fabrizio
这篇关于如何从pdf文档中获取字符偏移信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!