使用PHP从PDF提取注释和书签之类的元数据 [英] Extract metadata like comments and bookmarks from PDF using PHP
问题描述
我需要分析我的PHP应用程序中几个PDF文件的注释和书签. 有什么方法可以提取这些信息?
I'm required to analyze the comments and bookmarks of several PDF files in my PHP application. Is there any way to extract this information?
我需要的只是书签名称+层次结构和评论内容+坐标.
All I need is bookmarks name + hierarchy and comments content + coordinates.
我希望使用PHP库,但我也可以在服务器上安装其他软件,然后使用exec()进行调用.
I would prefer a PHP library but I could also install additional software on the server and call it with exec().
推荐答案
好, https://github.com /smalot/pdfparser 似乎能够提取书签以及注释.至少它提供了一个巨大的数组,其中包含所需的数据.
Ok, https://github.com/smalot/pdfparser seems to be able to extract bookmarks as well as annotations. At least it provides a huge array, containing the desired data.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('document.pdf');
print_r($pdf->getObjects());
我现在要做的就是找出如何处理这个数组...
All I have to do now is finding out how to process this array...
这篇关于使用PHP从PDF提取注释和书签之类的元数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!