遍历整个PDF并仅删除超链接(注释)的下划线+ iText [英] Traverse whole PDF and Remove underlines of hyperlinks (annotations) only + iText

查看:134
本文介绍了遍历整个PDF并仅删除超链接(注释)的下划线+ iText的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已成功使用以下链接代码更改了下划线的颜色.谁能帮助我如何从PDF中删除下划线,我已使用以下链接代码找到了这些下划线.

I have successfully changed the color of underlines using below link code. Can anyone help me how to remove underlines from PDF, the underlines i have find using below link code.

遍历整个PDF并将蓝色更改为黑色(也更改下划线的颜色)+ iText

下面是我的代码,这些代码正在查找超链接并将其颜色更改为黑色.我必须修改此代码才能删除这些下划线.

Below is my code that are finding hyperlinks and changing their colors to black. I have to modify this code to remove those underlines.

PdfCanvasEditor editor = new PdfCanvasEditor() {
    @Override
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        String operatorString = operator.toString();

        if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
            if (isApproximatelyEqual(operands.get(0), 0) &&
                    isApproximatelyEqual(operands.get(1), 0) &&
                    isApproximatelyEqual(operands.get(2), 1)) {
                super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
                return;
            }
        }

        if (SET_STROKE_RGB.equals(operatorString) && operands.size() == 4) {
            if (isApproximatelyEqual(operands.get(0), 0) &&
                    isApproximatelyEqual(operands.get(1), 0) &&
                    isApproximatelyEqual(operands.get(2), 1)) {
                super.write(processor, new PdfLiteral("G"), Arrays.asList(new PdfNumber(0), new PdfLiteral("G")));
                return;
            }
        }

        super.write(processor, operator, operands);
    }

    boolean isApproximatelyEqual(PdfObject number, float reference) {
        return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
    }

    final String SET_FILL_RGB = "rg";
    final String SET_STROKE_RGB = "RG";
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
    editor.editPage(pdfDocument, i);
}

以下文件无法接受:

https ://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/021549Orig1s025_aprepitant_clinpharm_prea_Mac.pdf (第41页)

https ://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/400_206494S5_avibactam_and_ceftazidine_unireview_prea_Mac.pdf (第60页).

请帮助.

推荐答案

如在所提及问题的上下文中的注释中所述

As described in a comment in the context of the referenced question

通过不放置绘制当前路径的指令来替换填充或描边指令,很容易使上面的编辑器类删除矢量图形.如果仅在适用的当前颜色为蓝色的情况下这样做,则在示例PDF的情况下可能会完成此工作.但是请注意,在带有其他带有蓝色元素的图形(例如徽标)的文档中,这些也将被残废.

it is easy to make the editor class above remove vector graphics by replacing fill or stroke instructions by instructions dropping the current path without drawing it. If only doing so in case of the applicable current color being blue, that would likely do the job in case of your example PDFs. But beware, in documents with other graphics with blue elements (e.g. logos), these would be mutilated, too.

这是以下内容编辑器的作用:

This is what the following content editor does:

class PdfGraphicsRemoverByColor extends PdfCanvasEditor {
    public PdfGraphicsRemoverByColor(Color color) {
        this.color = color;
    }

    @Override
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        String operatorString = operator.toString();

        if (color.equals(getGraphicsState().getFillColor())) {
            switch (operatorString) {
            case "f":
            case "f*":
            case "F":
                operatorString = "n";
                break;
            case "b":
            case "b*":
                operatorString = "s";
                break;
            case "B":
            case "B*":
                operatorString = "S";
                break;
            }
        }

        if (color.equals(getGraphicsState().getStrokeColor())) {
            switch (operatorString) {
            case "s":
            case "S":
                operatorString = "n";
                break;
            case "b":
            case "B":
                operatorString = "f";
                break;
            case "b*":
            case "B*":
                operatorString = "f*";
                break;
            }
        }

        operator = new PdfLiteral(operatorString);
        operands.set(operands.size() - 1, operator);
        super.write(processor, operator, operands);
    }

    final Color color;
}

(

(RemoveGraphicsByColor helper class)

像这样应用:

try (   PdfReader pdfReader = new PdfReader(INPUT);
        PdfWriter pdfWriter = new PdfWriter(OUTPUT);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfGraphicsRemoverByColor(ColorConstants.BLUE);
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(到示例文件Control_of_nitrosamine_impurities_in_sartans__rev.pdfEDQM_reports_issues_of_non-compliance_with_tooth__Mac.pdforiginalFile.pdf,可以得到:

to the example files Control_of_nitrosamine_impurities_in_sartans__rev.pdf, EDQM_reports_issues_of_non-compliance_with_tooth__Mac.pdf, and originalFile.pdf from the referenced question, one gets:

当心,这仅仅是概念验证,而不是最终而完整的解决方案.特别是:

Beware, this is merely a proof-of-concept, not a final and complete solution. In particular:

  • 仅考虑RGB蓝色.这可能是一个问题,尤其是在文档明确设计用于打印的情况下(可能使用CMYK颜色).

  • Only RGB blue is considered. This might be an issue particularly in case of documents explicitly designed for printing (likely using CMYK colors).

所有路径填充和笔触只要为蓝色,都将被丢弃.根据您的文档,可能需要过滤.

All path fills and strokes are dropped as long as they were blue. Depending on your documents this may have to be filtered.

PdfCanvasEditor仅检查和编辑页面本身的内容流,而不检查和显示XObjects或模式形式的内容流;因此,可能找不到某些内容.可以很容易地将其推广.

PdfCanvasEditor only inspects and edits the content stream of the page itself, not the content streams of displayed form XObjects or patterns; thus, some content may not be found. It can be generalized fairly easily.

测试上面的代码,您发现没有删除蓝线的文档.事实证明,这些蓝色不是来自 DeviceRGB 标准RGB,而是来自基于ICCBased 的色彩空间,更精确地分析了RGB色彩空间.此外,在一个文档中,不是使用纯蓝色0 0 1,而是使用了.17255 .3098 .63529蓝色.

Testing the code above you found documents in which the blue lines were not removed. As it turned out, these blue colors were not from the DeviceRGB standard RGB but instead from ICCBased colorspaces, profiled RGB color spaces to be more exact. Furthermore, in one document not a pure blue 0 0 1 but instead a .17255 .3098 .63529 blue was used.

为了也能够处理这些文档,必须概括上述方法;例如我们可以使用Predicate<Color>代替单个特定的Color,例如像这样:

To also be able to deal with these documents, the approach above must be generalized; e.g. we can use a Predicate<Color> instead of a single, specific Color, e.g. like this:

class PdfGraphicsRemoverByColorPredicate extends PdfCanvasEditor {
    public PdfGraphicsRemoverByColorPredicate(Predicate<Color> colorPredicate) {
        this.colorPredicate = colorPredicate;
    }

    @Override
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        String operatorString = operator.toString();

        if (colorPredicate.test(getGraphicsState().getFillColor())) {
            switch (operatorString) {
            case "f":
            case "f*":
            case "F":
                operatorString = "n";
                break;
            case "b":
            case "b*":
                operatorString = "s";
                break;
            case "B":
            case "B*":
                operatorString = "S";
                break;
            }
        }

        if (colorPredicate.test(getGraphicsState().getStrokeColor())) {
            switch (operatorString) {
            case "s":
            case "S":
                operatorString = "n";
                break;
            case "b":
            case "B":
                operatorString = "f";
                break;
            case "b*":
            case "B*":
                operatorString = "f*";
                break;
            }
        }

        operator = new PdfLiteral(operatorString);
        operands.set(operands.size() - 1, operator);
        super.write(processor, operator, operands);
    }

    final Predicate<Color> colorPredicate;
}

(

(RemoveGraphicsByColor helper class)

像这样应用:

try (   PdfReader pdfReader = new PdfReader(INPUT);
        PdfWriter pdfWriter = new PdfWriter(OUTPUT);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfGraphicsRemoverByColorPredicate(RemoveGraphicsByColor::isRgbBlue);
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(

(RemoveGraphicsByColor testRemoveAllBlueLinesFrom* tests)

使用此谓词方法到新的示例文件

to the new example files using this predicate method

public static boolean isRgbBlue(Color color) {
    if (color instanceof CalRgb || color instanceof DeviceRgb || (color instanceof IccBased && color.getNumberOfComponents() == 3)) {
        float[] components = color.getColorValue();
        float r = components[0];
        float g = components[1];
        float b = components[2];
        return b > .5f && r < .9f*b && g < .9f*b;
    }
    return false;
}

(

(RemoveGraphicsByColor helper method)

一个得到

当心,上面的警告仍然适用.

Beware, the warnings from above still apply.

这篇关于遍历整个PDF并仅删除超链接(注释)的下划线+ iText的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆