PDFBox 2 异常内存消耗 [英] PDFBox 2 unusual memory consumption

查看:199
本文介绍了PDFBox 2 异常内存消耗的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在尝试使用 PDFRenderer 的 renderImageWithDPI 方法渲染来自不同 PDF 文件的图像.在特定 PDF 上,对于某些页面,库渲染器具有不同的行为.

We are trying to render images from different PDF files, using PDFRenderer's method renderImageWithDPI. On a particular PDF, for some pages, the library renderer has a different behaviour.

渲染本身比其他类似页面花费的时间更长,并且内存消耗达到异常大的值:进程消耗的内存每 1-2 秒增加约 50MB,直到达到消耗的 RAM 等值 5GB在 renderImageWithDPI 中由应用程序进程执行.一旦线程完成 renderImageWithDPI,内存消耗几乎立即下降 1.5 - 2 GB.由于内存消耗高,有时会抛出Java Heap Space Exception.

The rendering itself takes way longer than for other similar pages, and the memory consumption reaches unusually big values: the memory consumed by the process goes up with about 50MB every 1 - 2 seconds, until it reaches values like 5GB of RAM consumed by the application process while in renderImageWithDPI. Once the thread finishes renderImageWithDPI, the memory consumption drops with 1.5 - 2 GB almost immediately. Due to the high memory consumption, sometimes a Java Heap Space Exception can be thrown.

发生这种情况的页面与其他页面没有明显不同,具有相同的宽度、高度和磁盘大小.渲染是用 250 DPI 完成的,图像类型 RGB.此外,应用程序正在使用-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider"参数运行.

The pages on which this happens are not visibly different than others, with the same width, height, and disk size. The rendering is done with 250 DPI, with ImageType RGB. Also, the application is running with the "-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider" parameter.

这是内存泄漏还是预期行为?另外,有人能解释一下为什么有些页面占用 2GB 内存并需要 1 分钟才能呈现,而其他页面则在几秒钟内呈现吗?

Is this a memory leak or an expected behaviour? Also, could somebody explain why some pages suck up 2GB of memory and take 1 minute to be rendered, while others are rendered in a couple of seconds?

推荐答案

对 PDF 的分析表明,第 34 页有超过 10000 个 XObject 元素,几乎都是 CMYK 图像.您可以使用 PDFDebugger 命令行应用程序亲自查看,转到第 34 页,然后是资源,然后是 XObject.在java中转换它们不是很快.内存使用很可能是由于我们缓存了这些图像.您可以观察到,下次显示页面时,速度会快得多.常见问题解答中显示了禁用缓存.

Analysis of the PDF shows that page 34 has over 10000 XObject elements, almost all of them CMYK images. You can see this yourself with the PDFDebugger command line app, go to page 34, then resources, then XObject. Converting them is not very fast in java. Memory usage is most likely due to us caching these images. You can observe that the next time the page is shown, it is done much faster. Disabling the cache is shown in the FAQ.

通过使用以下选项,我还获得了一些速度提升(21 秒而不是 89 秒):-Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true.然而,图像质量可能略有不同,请参阅 PDFBOX-3569 进行讨论.

I also get some speed improvement (21 seconds instead of 89 seconds) by using this option: -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true. However image quality may be very slightly different, see PDFBOX-3569 for a discussion.

这篇关于PDFBox 2 异常内存消耗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆