加载错误的 PDF 时捕获 PDFBox 警告 [英] catch PDFBox warnings when loading erroneous PDFs
问题描述
使用 PDFBox 加载 PDF 时,如果 PDF 错误,则会收到日志级警告:
when loading a PDF with PDFBox one gets log-level warnings if the PDF is erroneous:
PDDocument doc = PDDocument.load (new File (filename));
例如,这可能会导致控制台上的以下输出:
For example, this could lead to the following output on the console:
Dez 08, 2020 9:14:41 AM org.apache.pdfbox.pdfparser.COSParser validateStreamLength
WARNING: The end of the stream doesn't point to the correct offset, using workaround to read the stream, stream start position: 3141, length: 1674, expected end position: 4815
显然,pdf 在内容流中有一些错误,但它确实加载到 doc
中.但是是否可以使用 PDFBox 以编程方式捕获此警告?是否存在一些属性来告诉您文档加载后的警告信息?
Obviously, the pdf has some errors in the content stream, but it does load into doc
. But would it be possible to catch this warnings programmatically with PDFBox? Do some properties exist which tell you about the warnings after the document has been loaded?
我尝试过 PDFBox-Preflight,但它会检查 PDF/A 合规性,这会导致更多消息.
I've tried PDFBox-Preflight, but that checks for PDF/A compliance, which leads to much more messages.
推荐答案
尝试解析器的非宽松模式.此代码来自 ShowSignature.java 示例:
Try the non-lenient mode of the parser. This code is from the ShowSignature.java example:
RandomAccessBufferedFileInputStream raFile = new RandomAccessBufferedFileInputStream(file);
// If your files are not too large, you can also download the PDF into a byte array
// with IOUtils.toByteArray() and pass a RandomAccessBuffer() object to the
// PDFParser constructor.
PDFParser parser = new PDFParser(raFile);
parser.setLenient(false);
parser.parse();
PDDocument document = parser.getPDDocument();
这篇关于加载错误的 PDF 时捕获 PDFBox 警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!