使用Java一次阅读pdf uploadstream一页 [英] Read pdf uploadstream one page at a time with java

查看:102
本文介绍了使用Java一次阅读pdf uploadstream一页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在j2ee应用程序中阅读pdf文档.

I am trying to read a pdf document in a j2ee application.

对于Web应用程序,我必须将pdf文档存储在磁盘上.为了使搜索容易,我想对文档中的文本进行反向索引;如果是OCR.

For a webapplication I have to store pdf documents on disk. To make searching easy I want to make a reverse index of the text inside the document; if it is OCR.

使用PDFbox库,可以创建一个包含整个pdf文件的pdfDocument对象.但是,为了保留内存并提高整体性能,我宁愿将文档作为流处理,并一次将一页读入缓冲区.

With the PDFbox library its possible to create a pdfDocument object wich contains an entire pdf file. However to preserve memory and improve overall performance I'd rather handle the document as a stream and read one page at a time into a buffer.

我想知道是否可以一次一页一页甚至一行地读取包含pdf的文件流.

I wonder if it is possible to read a filestream containing pdf page by page or even one line at a time.

推荐答案

在2.0.*版本中,如下所示打开PDF:

In the 2.0.* versions, open the PDF like this:

PDDocument doc = PDDocument.load(file, MemoryUsageSetting.setupTempFileOnly());

这将设置缓冲内存的使用,以仅使用没有限制大小的临时文件(无主内存).

This will setup buffering memory usage to only use temporary file(s) (no main-memory) with no restricted size.

此处得到了答复.

这篇关于使用Java一次阅读pdf uploadstream一页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆