如何从PDF中提取图像及其元数据? [英] How can I extract images and their metadata from PDFs?
问题描述
是否可以使用Java从PDF文件中提取图像并将其导出到特定文件夹而不会丢失其原始创建和修改日期?我试图通过使用IText和PDFBox来实现这一目标,但没有成功。欢迎提出任何想法或示例。
Is it possible to use Java to extract images from a PDF file and export them to a specific folder without losing their original creation and modification dates? I tried to achieve this goal by using IText and PDFBox but had no success. Any ideas or examples are welcome.
推荐答案
图像不包含元数据,并存储为需要将其组合成图像的原始数据。我写了两篇博客文章,解释图像数据如何存储在PDF文件中 http://www.jpedal.org/PDFblog/2010/04/understanding-the-pdf-file-format-how-are-images-stored/ 和 http://www.jpedal.org / PDFblog / 2010/09 / understanding-the-pdf-file-format-images /
Images do not contain metadata and are stored as raw data which needs to be assemebled into images. I wrote 2 blog posts explaining how image data is stored in a PDF file at http://www.jpedal.org/PDFblog/2010/04/understanding-the-pdf-file-format-how-are-images-stored/ and http://www.jpedal.org/PDFblog/2010/09/understanding-the-pdf-file-format-images/
这篇关于如何从PDF中提取图像及其元数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!