加载错误的 PDF 时捕获 PDFBox 警告 [英] catch PDFBox warnings when loading erroneous PDFs

查看:147
本文介绍了加载错误的 PDF 时捕获 PDFBox 警告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 PDFBox 加载 PDF 时,如果 PDF 错误,则会收到日志级警告:

when loading a PDF with PDFBox one gets log-level warnings if the PDF is erroneous:

    PDDocument doc = PDDocument.load (new File (filename));

例如,这可能会导致控制台上的以下输出:

For example, this could lead to the following output on the console:

Dez 08, 2020 9:14:41 AM org.apache.pdfbox.pdfparser.COSParser validateStreamLength 
WARNING: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 3141, length: 1674, expected end position: 4815

显然,pdf 在内容流中有一些错误,但它确实加载到 doc 中.但是是否可以使用 PDFBox 以编程方式捕获此警告?是否存在一些属性来告诉您文档加载后的警告信息?

Obviously, the pdf has some errors in the content stream, but it does load into doc. But would it be possible to catch this warnings programmatically with PDFBox? Do some properties exist which tell you about the warnings after the document has been loaded?

我尝试过 PDFBox-Preflight,但它会检查 PDF/A 合规性,这会导致更多消息.

I've tried PDFBox-Preflight, but that checks for PDF/A compliance, which leads to much more messages.

推荐答案

尝试解析器的非宽松模式.此代码来自 ShowSignature.java 示例:

Try the non-lenient mode of the parser. This code is from the ShowSignature.java example:

RandomAccessBufferedFileInputStream raFile = new RandomAccessBufferedFileInputStream(file);
// If your files are not too large, you can also download the PDF into a byte array
// with IOUtils.toByteArray() and pass a RandomAccessBuffer() object to the
// PDFParser constructor.
PDFParser parser = new PDFParser(raFile);
parser.setLenient(false);
parser.parse();
PDDocument document = parser.getPDDocument();

这篇关于加载错误的 PDF 时捕获 PDFBox 警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆