在python中从PDF中提取图像而不重新采样? [英] Extract images from PDF without resampling, in python?

查看:111
本文介绍了在python中从PDF中提取图像而不重新采样?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何以原始分辨率和格式从pdf文档中提取所有图像? (意思是提取tiff为tiff,jpeg为jpeg等,无需重新采样)。布局是不重要的,我不在乎源图像位于页面上。

How might one extract all images from a pdf document, at native resolution and format? (Meaning extract tiff as tiff, jpeg as jpeg, etc. and without resampling). Layout is unimportant, I don't care were the source image is located on the page.

我正在使用python 2.7,但如果需要可以使用3.x. / p>

I'm using python 2.7 but can use 3.x if required.

推荐答案

通常在PDF中,图像只是按原样存储。例如,插入了jpg的PDF将在中间的某个位置具有一个字节范围,当提取时这些字节是有效的jpg文件。您可以使用它非常简单地从PDF中提取字节范围。我前段时间写过这篇文章的示例代码:从PDF中提取JPG

Often in a PDF, the image is simply stored as-is. For example, a PDF with a jpg inserted will have a range of bytes somewhere in the middle that when extracted is a valid jpg file. You can use this to very simply extract byte ranges from the PDF. I wrote about this some time ago, with sample code: Extracting JPGs from PDFs.

这篇关于在python中从PDF中提取图像而不重新采样?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆