iText“坐标超出允许范围”使用LocationTextLocationStrategy的异常 [英] iText "Coordinate outside allowed range" exception using LocationTextLocationStrategy

查看:518
本文介绍了iText“坐标超出允许范围”使用LocationTextLocationStrategy的异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试使用 LocationTextExtractionStrategy 时,抛出异常坐标超出允许范围。

An exception "Coordinate outside allowed range" is thrown when I try to use LocationTextExtractionStrategy.

for (int pageNum = 1; pageNum <= document.getNumberOfPages(); pageNum++)
{
    PdfPage page = document.getPage(pageNum);
    sb.append(PdfTextExtractor.getTextFromPage(page, new LocationTextExtractionStrategy()));
}

有关该异常的更多信息:

More information about the exception:

java.lang.IllegalStateException: Coordinate outside allowed range

    at com.itextpdf.kernel.pdf.canvas.parser.clipper.ClipperBase.rangeTest(ClipperBase.java:76)

在第一个文件中,我有2个由同一软件生成的相似PDF

I have 2 similar PDFs generated by the same software, in the first the exception is thrown, in the second not.

PDF 1(例外)

PDF 2(确定)

第一个PDF?

推荐答案

(根据您的堆栈跟踪,您正在使用iText 7)如何解决此问题? 。*版本。我相应地更新了您的问题标签,并使用当前的iText 7.1.2-SNAPSHOT重现了该问题。)

这两个PDF都包含极端的 y 坐标(超出ISO 32000-1的实施限制)来定义剪辑路径,您的PDF 1仅是PDF 2的两倍,而iText剪辑路径例程开始在两者之间进行连接。

Both your PDFs contain extreme y coordinates (beyond ISO 32000-1 implementation limits) for defining clip paths, your PDF 1 merely is twice as extreme as PDF 2 and iText clip path routines start hickup'ing somewhere in between.

PDF 1第1页的页面内容流基本上如下所示:

The page content stream of page 1 of PDF 1 essentially looks like this:

q
[...]
% modifyCTM
0.802969 0 0 -0.802969 0 842 cm
[...]
q
0 0 741 98417 re W n
[...]
Q
q
0 0 741 98417 re W n
[...]
Q
q
0 0 741 98417 re W n
[...]
Q
q
0 0 741 98417 re W n
[...]
Q
q
0 0 741 98417 re W n
[...]
Q
q
0 0 741 98417 re W n
[...]
Q
Q

因此,即使考虑对CTM进行初始修改您六次定义高度为 98417 * 0.802969 个默认用户单位的剪辑路径矩形,它们大约等于 79026 个默认用户单位。

Thus, even considering the initial modification of the CTM you six times define clip path rectangles with a height of 98417 * 0.802969 default user units which equal approximately 79026 default user units.

ISO 32000-1附件C.2 建筑极限表示

ISO 32000-1 Annex C.2 Architectural limits on the other hand indicates


符合标准的阅读器应容纳符合约束条件的PDF文件。

conforming readers should accommodate PDF files that obey the constraints.

[...]


  • 在默认用户空间中,最小页面大小应为3 x 3单位;最大值应为14,400 x 14,400个单位。

因此,您的剪辑路径矩形超过五个可能是符合标准的阅读器所支持的。因此,合格的阅读器不需要支持您的极端剪切路径。

Thus, your clip path rectangle is more than five times as high as a page can be that a conforming reader is expected to support. Consequentially a conforming reader need not support your extreme clip paths.

PDF 2的构建方式类似,所讨论的剪切路径仅为 41879 * 0.802969 个单位高,即 33628 个单位,仅比需要支持的单位高两倍以上。出于某些原因,iText似乎仍支持此功能。

PDF 2 is built similarly, the clip paths in question merely are 41879 * 0.802969 units high, i.e. about 33628 units, which merely is more than twice as high as needs to be supported. For some reasons iText appears to support this still.

您可以通过更改常量 com.itextpdf.kernel.pdf.canvas.parser.clipper.ClipperBridge.floatMultiplier

/**
 * Since the clipper library uses integer coordinates, we should convert
 * our floating point numbers into fixed point numbers by multiplying by
 * this coefficient. Vary it to adjust the preciseness of the calculations.
 */
public static double floatMultiplier = Math.pow(10, 14);

您可以尝试例如 Math.pow(10,10)对我的两个文件都适用。

You can try e.g. Math.pow(10, 10) which works for me with both your files.

话虽如此,ISO 32000-2似乎已放弃了此特定页面大小限制,只有更通用的限制以及诸如在特定设备和特定操作上运行的特定PDF处理器之类的语句环境将始终具有实际限制。

That been said, ISO 32000-2 appears to have dropped this specific page size limit, there merely are more generic limits plus statements like a particular PDF processor running on a particular device and in a particular operating environment will always have practical limits.

因此,@ iText应该考虑当前限制是否为实际限制或放松。

Thus, @iText should consider whether the current limits are such practical limits or should be relaxed.

这篇关于iText“坐标超出允许范围”使用LocationTextLocationStrategy的异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆