PDFBox:使用非常大的PDF。 [英] PDFBox: working with very large PDFs.

查看:908
本文介绍了PDFBox:使用非常大的PDF。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用一些非常大的PDF,一些超过7GB。 PDF包含多达20,000页和许多整页彩色图像。我想使用PDFBox来处理PDF,但由于尺寸的原因,当我尝试打开PDF时,我会得到OutOfMemoryError。

I am working with some very large PDFs, some over 7GB in size. The PDFs have up to 20,000 pages and many full page color images. I'd like to use PDFBox to work with the PDFs, but due to the size I get OutOfMemoryError's when I attempt to open the PDFs.

我正在使用版本pdfbox-app-1.6.0,在Windows 7上使用Intellij,java 6.

I'm working with version pdfbox-app-1.6.0, on Windows 7 using Intellij, java 6.

首先,我尝试编写一个简单的程序,只是在PDDocument中打开PDF并将每个页面复制到另一个PDDocument: http: //ideone.com/arKhB

First I tried writing a simple program that just opened the PDF in a PDDocument and coping each page over to another PDDocument: http://ideone.com/arKhB

接下来我尝试使用PDFBox CopyDoc 示例。

Next I tried using the PDFBox CopyDoc example.

两个示例都耗尽内存。

Both example run out of memory.

我假设这是因为PDFBox正试图将整个文档读入内存。有没有办法让它一次只打开1页?我知道处理速度会慢一些,但目前我无法处理任何事情。

I'm assuming this is because PDFBox is trying to read the whole document into memory. Is there a way to have it only open 1 page at a time? I know it would be slower processing, but at the moment I can't process anything.

推荐答案

在2.0。*版本中,打开这样的PDF:

In the 2.0.* versions, open the PDF like this:

PDDocument doc = PDDocument.load(file, MemoryUsageSetting.setupTempFileOnly());

这将设置缓冲内存使用量仅使用临时文件(无主内存)不受限制的大小。

This will setup buffering memory usage to only use temporary file(s) (no main-memory) with not restricted size.

更新17.4.2018:常见问题。尚未描述但自2.0.9以来是活动的,在进行渲染时使用 PDFRenderer.setSubsamplingAllowed(true)进行子采样(跳过像素行/行)。这为拥有大量图像文件的PDF文件节省了空间。

Update 17.4.2018: More tricks to save memory are described in the FAQ. Not yet described but active since 2.0.9 is subsampling (skip pixel lines/rows) with PDFRenderer.setSubsamplingAllowed(true) when doing rendering. This saves space for PDF files with huge image files.

这篇关于PDFBox:使用非常大的PDF。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆