遍历整个PDF并将蓝色更改为黑色(也更改下划线的颜色)+ iText [英] Traverse whole PDF and change blue color to black ( Change color of underlines as well) + iText

查看:282
本文介绍了遍历整个PDF并将蓝色更改为黑色(也更改下划线的颜色)+ iText的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下代码从pdf文本中删除蓝色.一切正常.但这不是更改底线颜色,而是正确更改文本颜色.

I am using below code to remove blue colors from pdf text. It is working fine. But it is not changing underlines color, but changing text color correctly.

原始文件部分:

操作文件:

如您在上面操作过的文件中所看到的,下划线颜色没有改变.

As you see in above manipulated file, underline color didn't change.

两个星期以来,我一直在寻找解决方法,任何人都可以帮忙.下面是我的更改颜色代码:

I am looking fix for this thing since two weeks, can anyone help on this. Below is my change color code:

public void testChangeBlackTextToGreenDocument(String source, String filename) throws IOException {
    try (InputStream resource = getClass().getResourceAsStream(source);
            PdfReader pdfReader = new PdfReader(source);
            OutputStream result = new FileOutputStream(filename);
            PdfWriter pdfWriter = new PdfWriter(result);
            PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter);) {
        PdfCanvasEditor editor = new PdfCanvasEditor() {

            @Override
            protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands) {

                String operatorString = operator.toString();

                if (TEXT_SHOWING_OPERATORS.contains(operatorString)) {
                    List<PdfObject> listobj = new ArrayList<>();
                    listobj.add(new PdfNumber(0));
                    listobj.add(new PdfNumber(0));
                    listobj.add(new PdfNumber(0));
                    listobj.add(new PdfLiteral("rg"));
                    if (currentlyReplacedBlack == null) {
                        Color currentFillColor =getGraphicsState().getFillColor();
                        if (ColorConstants.GREEN.equals(currentFillColor) || ColorConstants.CYAN.equals(currentFillColor) || ColorConstants.BLUE.equals(currentFillColor)) {
                            currentlyReplacedBlack = currentFillColor;
                            super.write(processor, new PdfLiteral("rg"), listobj);
                        }
                    }
                } else if (currentlyReplacedBlack != null) {
                    if (currentlyReplacedBlack instanceof DeviceCmyk) {
                        List<PdfObject> listobj = new ArrayList<>();
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfLiteral("k"));
                        super.write(processor, new PdfLiteral("k"), listobj);
                    } else if (currentlyReplacedBlack instanceof DeviceGray) {
                        List<PdfObject> listobj = new ArrayList<>();
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfLiteral("g"));
                        super.write(processor, new PdfLiteral("g"), listobj);
                    } else {
                        List<PdfObject> listobj = new ArrayList<>();
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfLiteral("rg"));
                        super.write(processor, new PdfLiteral("rg"), listobj);
                    }
                    currentlyReplacedBlack = null;
                }

                super.write(processor, operator, operands);
            }

            Color currentlyReplacedBlack = null;

            final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
        };
        for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
            editor.editPage(pdfDocument, i);
        }
    }
    File file = new File(source);
    file.delete();
}

这是原始文件. https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/originalFile.pdf

相关链接:

从PDF iTextSharp中删除水印

Maven依赖项详细信息:

Maven Dependcy Details:

        <dependency>
        <groupId>com.itextpdf</groupId>
        <artifactId>itext7-core</artifactId>
        <version>7.1.5</version>
        <type>pom</type>
    </dependency>

    <dependency>
        <groupId>com.itextpdf</groupId>
        <artifactId>itextpdf</artifactId>
        <version>5.0.6</version>
    </dependency>

以下文件无法接受:

https ://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/021549Orig1s025_aprepitant_clinpharm_prea_Mac.pdf (第41页)

https ://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/400_206494S5_avibactam_and_ceftazidine_unireview_prea_Mac.pdf (第60页).

请帮助.

推荐答案

(此处的示例代码使用iText 7 for Java.您在标记或问题文本中都没有提到iText版本或编程环境,但是您的示例代码似乎表明这是您的选择组合.)

您的原始代码基于显式尝试仅更改 text 颜色的测试.但是,文档中的下划线"(就PDF绘图而言)不是文本的一部分,而是作为简单路径绘制的.因此,原始代码不会明显触碰到下划线,因此必须根据您的任务进行修改.

The test you based your original code on attempts explicitly only to change text color. The "underline" in your document, though, is (as far as PDF drawing is concerned) not part of the text but instead drawn as a simple path. Thus, the underline explicitly is not touched by the original code and it has to be adapted for your task.

但是实际上您的任务是将一切从蓝色更改为黑色,比仅更改蓝色文本(例如

But actually your task, changing everything blue to black, is easier to implement than only changing the blue text, e.g.

try (   PdfReader pdfReader = new PdfReader(SOURCE_PDF);
        PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {
        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
                if (isApproximatelyEqual(operands.get(0), 0) &&
                        isApproximatelyEqual(operands.get(1), 0) &&
                        isApproximatelyEqual(operands.get(2), 1)) {
                    super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
                    return;
                }
            }

            super.write(processor, operator, operands);
        }

        boolean isApproximatelyEqual(PdfObject number, float reference) {
            return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
        }

        final String SET_FILL_RGB = "rg";
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(当心,这仅仅是概念验证,而不是最终而完整的解决方案.特别是:

Beware, this is merely a proof-of-concept, not a final and complete solution. In particular:

  • 它仅查看填充(非描边)颜色.在您的文本(通常)和下划线都足够的情况下,仅使用填充色-实际上,下划线不是绘制为描边线,而是绘制为细长的填充矩形.
  • 仅RGB蓝色(并且仅使用 rg 指令设置了这种蓝色,而不是使用 sc scn 设置的蓝色,更不用说蓝色了)考虑使用时髦的混合模式设置其他颜色).这可能是一个问题,特别是对于明确设计用于打印的文档(可能使用CMYK颜色)的情况.
  • PdfCanvasEditor仅检查和编辑页面本身的内容流,而不检查和显示XObjects或模式的内容流;因此,可能找不到某些内容.可以很容易地将其推广.
  • It merely looks at the fill (non-stroking) colors. In your case that suffices as both your text (as usual) and your underline use fill colors only - the underline actually is not drawn as a stroked line but instead as a slim, filled rectangle.
  • Only RGB blue (and only such blue set using the rg instruction, not set using sc or scn, let alone blues combined out of other colors using funky blend modes) is considered. This might be an issue particularly in case of documents explicitly designed for printing (likely using CMYK colors).
  • PdfCanvasEditor only inspects and edits the content stream of the page itself, not the content streams of displayed form XObjects or patterns; thus, some content may not be found. It can be generalized fairly easily.

结果:

通过测试上面的代码,您很快发现未更改下划线的文档.事实证明,这些下划线实际上是绘制为描边线,而不是上面的实心矩形.

Testing the code above you soon found documents in which the underlines were not changed. As it turned out, these underlines are actually drawn as stroked lines, not as filled rectangle as above.

因此,要正确地编辑此类文档,不仅必须编辑填充色,还必须编辑笔触颜色,例如像这样:

To also properly edit such documents, therefore, you must not only edit the fill colors but also the stroke colors, e.g. like this:

try (   PdfReader pdfReader = new PdfReader(SOURCE_PDF);
        PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {
        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
                if (isApproximatelyEqual(operands.get(0), 0) &&
                        isApproximatelyEqual(operands.get(1), 0) &&
                        isApproximatelyEqual(operands.get(2), 1)) {
                    super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
                    return;
                }
            }

            if (SET_STROKE_RGB.equals(operatorString) && operands.size() == 4) {
                if (isApproximatelyEqual(operands.get(0), 0) &&
                        isApproximatelyEqual(operands.get(1), 0) &&
                        isApproximatelyEqual(operands.get(2), 1)) {
                    super.write(processor, new PdfLiteral("G"), Arrays.asList(new PdfNumber(0), new PdfLiteral("G")));
                    return;
                }
            }

            super.write(processor, operator, operands);
        }

        boolean isApproximatelyEqual(PdfObject number, float reference) {
            return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
        }

        final String SET_FILL_RGB = "rg";
        final String SET_STROKE_RGB = "RG";
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(

(ChangeColor tests testChangeRgbBlueToBlackControlOfNitrosamineImpuritiesInSartansRev and testChangeRgbBlueToBlackEdqmReportsIssuesOfNonComplianceWithToothMac)

结果:

再次测试上面的代码,您发现没有更改蓝色的文档.事实证明,这些蓝色不是来自 DeviceRGB 标准RGB,而是来自基于ICCBased 的色彩空间,更精确地分析了RGB色彩空间.特别是,使用了其他颜色设置运算符,而不是以前的 sc / scn 而不是 rg .此外,在一个文档中,不是纯蓝色0 0 1,而是使用了.17255 .3098 .63529蓝色

Testing the code above you again found documents in which the blue colors were not changed. As it turned out, these blue colors were not from the DeviceRGB standard RGB but instead from ICCBased colorspaces, profiled RGB color spaces to be more exact. In particular other color setting operators were used than before, sc / scn instead of rg. Furthermore, in one document not a pure blue 0 0 1 but instead a .17255 .3098 .63529 blue was used

如果我们假设带有三个数字参数的 sc scn 指令在此处设置了某种RGB颜色(通常这是过分简化,Lab和其他颜色空间)也可以包含4个组件,但是您的文档似乎是RGB格式),并且对蓝色的识别不太严格,我们可以对上面的代码进行如下概括:

If we assume that sc and scn instructions with three numeric arguments set some flavor of RGB colors as here (in general this is an oversimplification, Lab and other color spaces can also come with 4 components, but your documents seem RGB oriented) and are less strict in recognizing the blue color, we can generalize the code above as follows:

class AllRgbBlueToBlackConverter extends PdfCanvasEditor {
    @Override
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        String operatorString = operator.toString();

        if (RGB_SETTER_CANDIDATES.contains(operatorString) && operands.size() == 4) {
            if (isBlue(operands.get(0), operands.get(1), operands.get(2))) {
                PdfNumber number0 = new PdfNumber(0);
                operands.set(0, number0);
                operands.set(1, number0);
                operands.set(2, number0);
            }
        }

        super.write(processor, operator, operands);
    }

    boolean isBlue(PdfObject red, PdfObject green, PdfObject blue) {
        if (red instanceof PdfNumber && green instanceof PdfNumber && blue instanceof PdfNumber) {
            float r = ((PdfNumber)red).floatValue();
            float g = ((PdfNumber)green).floatValue();
            float b = ((PdfNumber)blue).floatValue();
            return b > .5f && r < .9f*b && g < .9f*b;
        }
        return false;
    }

    final Set<String> RGB_SETTER_CANDIDATES = new HashSet<>(Arrays.asList("rg", "RG", "sc", "SC", "scn", "SCN"));
}

(

(ChangeColor helper class)

像这样使用

try (   PdfReader pdfReader = new PdfReader(INPUT);
        PdfWriter pdfWriter = new PdfWriter(OUTPUT);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) ) {
    PdfCanvasEditor editor = new AllRgbBlueToBlackConverter();
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

我们得到

这篇关于遍历整个PDF并将蓝色更改为黑色(也更改下划线的颜色)+ iText的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆