从 PDF iTextSharp 中去除水印 [英] Removing Watermark from PDF iTextSharp

查看:104
本文介绍了从 PDF iTextSharp 中去除水印的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经完成了此处建议的解决方案,但我的问题有点不同.在上面链接提供的解决方案中,只有使用iTextSharp添加水印才能去除水印.就我而言,我在某些情况下使用 Microsoft Word 添加水印.当我使用以下代码时,水印确实从 PDF 中消失了,但是当我将 PDF 转换为 Word 时,水印再次作为图像出现.根据我的理解,下面的代码所做的是将水印的不透明度值更改为 0,因此它消失了.

private static void removeWatermark(string watermarkedFile, string unwatermarkedFile){PdfReader.unethicalreading = true;PdfReader reader = new PdfReader(watermarkedFile);reader.RemoveUnusedObjects();int pageCount = reader.NumberOfPages;for (int i = 1; i <= pageCount; i++){var page = reader.GetPageN(i);PdfDictionary 资源 = page.GetAsDict(PdfName.RESOURCES);PdfDictionary extGStates = resources.GetAsDict(PdfName.EXTGSTATE);如果(extGStates == null)继续;foreach(extGStates.Keys 中的 PdfName 名称){var obj = extGStates.Get(name);PdfDictionary extGStateObject = (PdfDictionary)PdfReader.GetPdfObject(obj);var stateNumber = extGStateObject.Get(PdfName.ca);if (stateNumber == null)继续;var caNumber = (PdfNumber)PdfReader.GetPdfObject(stateNumber);如果(caNumber.FloatValue != 1f){extGStateObject.Remove(PdfName.ca);extGStateObject.Put(PdfName.ca, new PdfNumber(0f));}}}使用 (FileStream fs = new FileStream(unwatermarkedFile, FileMode.Create, FileAccess.Write, FileShare.None)){使用 (PdfStamper 压模 = 新 PdfStamper(reader, fs)){压模.SetFullCompression();压模.关闭();}}}

有什么办法可以通过修改代码删除这个水印吗?

解决方案

正如 OP 已经提到的,如果您可以完全控制最初创建水印的过程,您可以按照@ChrisHaas 在 他对OP 提到的问题.

另一方面,如果您创建水印的工具以自己的方式创建水印,您将需要为这些水印定制的方法.

此方法通常需要您编辑一些内容流.顺便说一下,@ChrisHaas 的解决方案也是如此.

为了简化这一过程,您应该首先创建一个通用的内容流编辑功能,然后仅使用该功能来编辑掉那些水印.

因此,这里首先是一个示例通用内容流编辑器类,然后是一个基于此的解决方案来编辑​​掉 OP 的示例水印.

通用内容流编辑器类

这个PdfContentStreamEditor类通过跟踪部分图形状态的指令解析原始内容流指令;指令被转发到它的 Write 方法,默认情况下,当它们进入时将它们写回,有效地创建原始流的相同或至少等效的副本.

要实际编辑流,只需覆盖此 Write 方法,并且只将结果流中所需的指令转发到基本的 Write 方法.

公共类 PdfContentStreamEditor : PdfContentStreamProcessor{/*** 此方法编辑页面的直接内容,即其内容流.* 它明确地不下降为 xobjects、patterns 或 annotations 的形式.*/public void EditPage(PdfStamper pdfStamper, int pageNum){PdfReader pdfReader = pdfStamper.Reader;PdfDictionary page = pdfReader.GetPageN(pageNum);byte[] pageContentInput = ContentByteUtils.GetContentBytesForPage(pdfReader, pageNum);page.Remove(PdfName.CONTENTS);EditContent(pageContentInput, page.GetAsDict(PdfName.RESOURCES), pdfStamper.GetUnderContent(pageNum));}/*** 此方法处理内容字节并输出到给定的画布.* 它明确地不下降为 xobjects、patterns 或 annotations 的形式.*/public void EditContent(byte[] contentBytes, PdfDictionary 资源, PdfContentByte canvas){this.canvas = 画布;ProcessContent(contentBytes, 资源);this.canvas = null;}/*** 此方法将内容流操作写入目标画布.默认的* 实现在它们来时写入它们,因此它基本上生成相同的* {@link ContentOperatorWrapper} 实例的原始指令副本* 转发它.** 重写此方法以实现一些奇特的编辑效果.*/protected virtual void Write(PdfContentStreamProcessor 处理器,PdfLiteral operatorLit,List<PdfObject> 操作数){整数索引 = 0;foreach(操作数中的PdfObject pdfObject){pdfObject.ToPdf(canvas.PdfWriter, canvas.InternalBuffer);canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '
');}}////构造函数为父级提供一个可以与之交谈的虚拟侦听器//公共 PdfContentStreamEditor() : base(new DummyRenderListener()){}////覆盖 PdfContentStreamProcessor 方法//公共覆盖 IContentOperator RegisterContentOperator(String operatorString, IContentOperator newOperator){ContentOperatorWrapper 包装器 = new ContentOperatorWrapper();wrapper.setOriginalOperator(newOperator);IContentOperator previousOperator = base.RegisterContentOperator(operatorString, wrapper);返回前操作符是 ContentOperatorWrapper 吗?((ContentOperatorWrapper)formerOperator).getOriginalOperator():前操作符;}public override void ProcessContent(byte[] contentBytes, PdfDictionary 资源){this.resources = 资源;base.ProcessContent(contentBytes, resources);this.resources = null;}////持有输出画布和资源的成员//受保护的 PdfContentByte 画布 = null;受保护的 PdfDictionary 资源 = null;////一个内容操作符类,用于包装所有内容操作符以将调用转发给编辑器//类 ContentOperatorWrapper : IContentOperator{公共 IContentOperator getOriginalOperator(){返回原始运算符;}public void setOriginalOperator(IContentOperator originalOperator){this.originalOperator = originalOperator;}公共无效调用(PdfContentStreamProcessor 处理器,PdfLiteral 操作,List 操作数){if (originalOperator != null && !"Do".Equals(oper.ToString())){originalOperator.Invoke(处理器,操作符,操作数);}((PdfContentStreamEditor)processor).Write(processor, oper, operands);}私有 IContentOperator originalOperator = null;}////一个虚拟渲染侦听器,用于提供给底层内容流处理器以将事件提供给//类 DummyRenderListener : IRenderListener{public void BeginTextBlock() { }公共无效RenderText(TextRenderInfo renderInfo){}公共无效 EndTextBlock() { }公共无效渲染图像(ImageRenderInfo 渲染信息){}}}

一些背景:

此类从 iTextSharp 解析器命名空间扩展 PdfContentStreamProcessor.此类最初旨在仅解析内容流以返回用于文本、图像或图形提取的信息.我们利用它来跟踪部分图形状态,更准确地说是那些与文本提取相关的图形状态参数.

如果对于特定的编辑任务,还需要预处理信息,例如当前指令绘制的文本,可以使用自定义 IRenderListener 实现来检索该信息,而不是此处使用的 DummyRenderListener 直接忽略它.>

该类架构的灵感来自 iTextSharp.xtra 额外库中的 PdfCleanUpProcessor.

隐藏OP水印的编辑器

正如 OP 已经发现的那样,他的水印可以被识别为唯一使用在 ExtGState 对象中定义为 ca 值的透明度的文档部分.因此,要隐藏水印,我们必须

  • 识别相对于该值的图形状态变化
  • 当识别的当前 ca 值小于 1 时,不绘制任何内容.

实际上水印是使用矢量图形操作构建的.因此,我们可以将我们的编辑限制在这些操作上.我们甚至可以限制它改变最终的绘图指令(描边"/填充"/填充和描边"加上某些变体)不做生成透明内容的部分(填充或描边).

 公共类 TransparentGraphicsRemover : PdfContentStreamEditor{protected override void Write(PdfContentStreamProcessor 处理器,PdfLiteral oper,List 操作数){String operatorString = oper.ToString();if ("gs".Equals(operatorString)){updateTransparencyFrom((PdfName) 操作数[0]);}如果(operatorMapping.Keys.Contains(operatorString)){//如果涉及透明度,则降级绘图运算符//有关详细信息,请参阅operatorMapping 声明前的注释PdfLiteral[] mapping = operatorMapping[operatorString];整数索引 = 0;如果 (strokingAlpha <1)指数 |= 1;if (nonStrokingAlpha <1)指数 |= 2;操作=映射[索引];操作数[操作数.计数 - 1] = 操作;}base.Write(处理器,操作符,操作数);}//当前透明度值;当心:保存和恢复状态操作被忽略!浮动抚摸Alpha = 1;浮动 nonStrokingAlpha = 1;void updateTransparencyFrom(PdfName gsName){PdfDictionary extGState = getGraphicsStateDictionary(gsName);如果(extGState != null){PdfNumber number = extGState.GetAsNumber(PdfName.ca);如果(数字!= null)nonStrokingAlpha = number.FloatValue;number = extGState.GetAsNumber(PdfName.CA);如果(数字!= null)strokingAlpha = number.FloatValue;}}PdfDictionary getGraphicsStateDictionary(PdfName gsName){PdfDictionary extGStates = resources.GetAsDict(PdfName.EXTGSTATE);返回 extGStates.GetAsDict(gsName);}////从操作符名称映射到它依赖的操作数组//在当前图形状态上:////* [0] 不透明情况下的操作//* [1] 描边透明时的操作//* [2] 非描边透明情况下的操作//* [3] 描边和非描边透明时的操作//字典<字符串,PdfLiteral[]>operatorMapping = new Dictionary();public TransparentGraphicsRemover(){PdfLiteral _S = new PdfLiteral("S");PdfLiteral _s = new PdfLiteral("s");PdfLiteral _f = new PdfLiteral("f");PdfLiteral _fStar = new PdfLiteral("f*");PdfLiteral _B = new PdfLiteral("B");PdfLiteral _BStar = new PdfLiteral("B*");PdfLiteral _b = new PdfLiteral("b");PdfLiteral _bStar = new PdfLiteral("b*");PdfLiteral _n = new PdfLiteral("n");operatorMapping["S"] = new PdfLiteral[]{ _S, _n, _S, _n };operatorMapping["s"] = new PdfLiteral[]{ _s, _n, _s, _n };operatorMapping["f"] = new PdfLiteral[]{ _f, _f, _n, _n };operatorMapping["F"] = new PdfLiteral[]{ _f, _f, _n, _n };operatorMapping["f*"] = new PdfLiteral[]{ _fStar, _fStar, _n, _n };operatorMapping["B"] = new PdfLiteral[]{ _B, _f, _S, _n };operatorMapping["B*"] = new PdfLiteral[]{ _BStar, _fStar, _S, _n };operatorMapping["b"] = new PdfLiteral[] { _b, _f, _s, _n };operatorMapping["b*"] = new PdfLiteral[]{ _bStar, _fStar, _s, _n };}}

注意:这个示例编辑器非常简单:

  • 它只考虑由 ExtGState 参数 caCA 创建的透明度,它特别忽略掩码.
  • 它不寻找保存或恢复图形状态的操作.

这些限制很容易解除,但需要更多的代码而不适合 stackoverflow 答案.

像这样将此编辑器应用于 OP 的示例文件

string source = @"test3.pdf";string dest = @"test3-noTransparency.pdf";使用 (PdfReader pdfReader = new PdfReader(source))使用 (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write))){PdfContentStreamEditor editor = new TransparentGraphicsRemover();for (int i = 1; i <= pdfReader.NumberOfPages; i++){editor.EditPage(pdfStamper, i);}}

生成没有水印的 PDF 文件.

我没有 OP 将内容导出到 word 的工具,NitroPDF 和 Foxit,所以我无法执行最终测试.Adobe Acrobat(9.5 版)至少在导出到 Word 时不包含水印.

如果 OP 的工具在导出的 Word 文件中仍有水印痕迹,则可以轻松改进此类以在透明度处于活动状态时实际删除路径创建和绘制操作.

Java 中相同

我开始在 Java 中为 iText 实现这个,后来才意识到 OP 在他的脑海里有 .Net 中的 iTextSharp.下面是等效的 Java 类:

公共类 PdfContentStreamEditor 扩展 PdfContentStreamProcessor{/*** 此方法编辑页面的直接内容,即其内容流.* 它明确地不下降为 xobjects、patterns 或 annotations 的形式.*/public void editPage(PdfStamper pdfStamper, int pageNum) 抛出 IOException{pdfReader pdfReader = pdfStamper.getReader();PdfDictionary page = pdfReader.getPageN(pageNum);byte[] pageContentInput = ContentByteUtils.getContentBytesForPage(pdfReader, pageNum);page.remove(PdfName.CONTENTS);editContent(pageContentInput, page.getAsDict(PdfName.RESOURCES), pdfStamper.getUnderContent(pageNum));}/*** 此方法处理内容字节并输出到给定的画布.* 它明确地不下降为 xobjects、patterns 或 annotations 的形式.*/public void editContent(byte[] contentBytes, PdfDictionary 资源, PdfContentByte canvas){this.canvas = 画布;处理内容(内容字节,资源);this.canvas = null;}/*** <p>* 此方法将内容流操作写入目标画布.默认的* 实现在它们来时写入它们,因此它基本上生成相同的* {@link ContentOperatorWrapper} 实例的原始指令副本* 转发它.* </p>* <p>* 重写此方法以实现一些奇特的编辑效果.* </p>*/protected void write(PdfContentStreamProcessor 处理器,PdfLiteral 运算符,List 操作数) 抛出 IOException{整数索引 = 0;for(PdfObject 对象:操作数){object.toPdf(canvas.getPdfWriter(), canvas.getInternalBuffer());canvas.getInternalBuffer().append(operands.size() > ++index ? (byte) ' ' : (byte) '
');}}////构造函数为父级提供一个可以与之交谈的虚拟侦听器//公共 PdfContentStreamEditor(){超级(新 DummyRenderListener());}////覆盖 PdfContentStreamProcessor 方法//@覆盖public ContentOperator registerContentOperator(String operatorString, ContentOperator operator){ContentOperatorWrapper 包装器 = new ContentOperatorWrapper();wrapper.setOriginalOperator(operator);ContentOperator previousOperator = super.registerContentOperator(operatorString, wrapper);返回前操作员实例的 ContentOperatorWrapper ?((ContentOperatorWrapper)formerOperator).getOriginalOperator():前操作符;}@覆盖public void processContent(byte[] contentBytes, PdfDictionary 资源){this.resources = 资源;super.processContent(contentBytes, 资源);this.resources = null;}////持有输出画布和资源的成员//受保护的 PdfContentByte 画布 = null;受保护的 PdfDictionary 资源 = null;////一个内容操作符类,用于包装所有内容操作符以将调用转发给编辑器//类 ContentOperatorWrapper 实现了 ContentOperator{公共 ContentOperator getOriginalOperator(){返回原始运算符;}public void setOriginalOperator(ContentOperator originalOperator){this.originalOperator = originalOperator;}@覆盖public void invoke(PdfContentStreamProcessor 处理器,PdfLiteral 运算符,ArrayList 操作数) 抛出异常{if (originalOperator != null && !"Do".equals(operator.toString())){originalOperator.invoke(处理器,运算符,操作数);}写(处理器,运算符,操作数);}private ContentOperator originalOperator = null;}////一个虚拟渲染侦听器,用于提供给底层内容流处理器以将事件提供给//静态类 DummyRenderListener 实现了 RenderListener{@覆盖public void beginTextBlock() { }@覆盖public void renderText(TextRenderInfo renderInfo) { }@覆盖公共无效 endTextBlock() { }@覆盖public void renderImage(ImageRenderInfo renderInfo) { }}}

(PdfContentStreamEditor.java)

 公共类 TransparentGraphicsRemover 扩展 PdfContentStreamEditor{@覆盖protected void write(PdfContentStreamProcessor 处理器,PdfLiteral 运算符,List 操作数) 抛出 IOException{String operatorString = operator.toString();if ("gs".equals(operatorString)){updateTransparencyFrom((PdfName) 操作数.get(0));}PdfLiteral[] mapping = operatorMapping.get(operatorString);如果(映射!= null){整数索引 = 0;如果 (strokingAlpha <1)指数 |= 1;if (nonStrokingAlpha <1)指数 |= 2;运算符 = 映射[索引];操作数.设置(操作数.大小() - 1,运算符);}super.write(处理器,运算符,操作数);}//当前透明度值;当心:保存和恢复状态操作被忽略!浮动抚摸Alpha = 1;浮动 nonStrokingAlpha = 1;void updateTransparencyFrom(PdfName gsName){PdfDictionary extGState = getGraphicsStateDictionary(gsName);如果(extGState != null){PdfNumber number = extGState.getAsNumber(PdfName.ca);如果(数字!= null)nonStrokingAlpha = number.floatValue();number = extGState.getAsNumber(PdfName.CA);如果(数字!= null)strokingAlpha = number.floatValue();}}PdfDictionary getGraphicsStateDictionary(PdfName gsName){PdfDictionary extGStates = resources.getAsDict(PdfName.EXTGSTATE);返回 extGStates.getAsDict(gsName);}////从操作符名称映射到它依赖的操作数组//在当前图形状态上:////* [0] 不透明情况下的操作//* [1] 描边透明时的操作//* [2] 非描边透明情况下的操作//* [3] 描边和非描边透明时的操作//静态地图operatorMapping = new HashMap();静止的{PdfLiteral _S = new PdfLiteral("S");PdfLiteral _s = new PdfLiteral("s");PdfLiteral _f = new PdfLiteral("f");PdfLiteral _fStar = new PdfLiteral("f*");PdfLiteral _B = new PdfLiteral("B");PdfLiteral _BStar = new PdfLiteral("B*");PdfLiteral _b = new PdfLiteral("b");PdfLiteral _bStar = new PdfLiteral("b*");PdfLiteral _n = new PdfLiteral("n");operatorMapping.put("S", new PdfLiteral[]{ _S, _n, _S, _n });operatorMapping.put("s", new PdfLiteral[]{ _s, _n, _s, _n });operatorMapping.put("f", new PdfLiteral[]{ _f, _f, _n, _n });operatorMapping.put("F", new PdfLiteral[]{ _f, _f, _n, _n });operatorMapping.put("f*", new PdfLiteral[]{ _fStar, _fStar, _n, _n });operatorMapping.put("B", new PdfLiteral[]{ _B, _f, _S, _n });operatorMapping.put("B*", new PdfLiteral[]{ _BStar, _fStar, _S, _n });operatorMapping.put("b", new PdfLiteral[]{ _b, _f, _s, _n });operatorMapping.put("b*", new PdfLiteral[]{ _bStar, _fStar, _s, _n });}}

(TransparentGraphicsRemover.java)

@Testpublic void testRemoveTransparentGraphicsTest3() 抛出 IOException、DocumentException{尝试 ( InputStream 资源 = getClass().getResourceAsStream("test3.pdf");OutputStream 结果 = new FileOutputStream(new File(RESULT_FOLDER, "test3-noTransparency.pdf"))){PdfReader pdfReader = new PdfReader(resource);PdfStamper pdfStamper = new PdfStamper(pdfReader, result);PdfContentStreamEditor editor = new TransparentGraphicsRemover();for (int i = 1; i <= pdfReader.getNumberOfPages(); i++){editor.editPage(pdfStamper, i);}pdfStamper.close();}}

(摘自 EditPageContent.java)

I have gone through the solution suggested here but my problem is a little different. In the solution provided at the above link, one can remove the watermark only if iTextSharp is used to add the watermark as well. In my case, I am adding a watermark in some cases using Microsoft Word. When I use the following code, the watermark does disappear from the PDF but when I convert the PDF to Word, it watermark appears again as an image. As per my understanding, what the code below does is that it changes the opacity value of the watermark to 0 and therefore it disappears.

private static void removeWatermark(string watermarkedFile, string unwatermarkedFile)
{
    PdfReader.unethicalreading = true;
    PdfReader reader = new PdfReader(watermarkedFile);
    reader.RemoveUnusedObjects();
    int pageCount = reader.NumberOfPages;
    for (int i = 1; i <= pageCount; i++)
    {
        var page = reader.GetPageN(i);
        PdfDictionary resources = page.GetAsDict(PdfName.RESOURCES);
        PdfDictionary extGStates = resources.GetAsDict(PdfName.EXTGSTATE);
        if (extGStates == null)
            continue;

        foreach (PdfName name in extGStates.Keys)
        {
            var obj = extGStates.Get(name);
            PdfDictionary extGStateObject = (PdfDictionary)PdfReader.GetPdfObject(obj);
            var stateNumber = extGStateObject.Get(PdfName.ca);
            if (stateNumber == null)
                continue;

            var caNumber = (PdfNumber)PdfReader.GetPdfObject(stateNumber);
            if (caNumber.FloatValue != 1f)
            {
                extGStateObject.Remove(PdfName.ca);

                extGStateObject.Put(PdfName.ca, new PdfNumber(0f));
            }
        }
    }

    using (FileStream fs = new FileStream(unwatermarkedFile, FileMode.Create, FileAccess.Write, FileShare.None))
    {
        using (PdfStamper stamper = new PdfStamper(reader, fs))
        {
            stamper.SetFullCompression();
            stamper.Close();
        }
    }
}

Is there a way to be able to delete this watermark by modifying the code?

解决方案

As the OP already mentioned, if you have complete control over the process originally creating the watermark, you can do as @ChrisHaas explained in his answer to the question the OP referred to.

If on the other hand the tool you create the watermark with does so in its own way, you will need a method customized for those watermarks.

This method usually will require that you edit some content stream. @ChrisHaas' solution, by the way, does so, too.

To make this easier, one should start by creating a generic content stream editing functionality and then only use this functionality to edit out those watermarks.

Thus, here at first a sample generic content stream editor class and then a solution based thereon to edit out the OP's sample watermark.

A generic content stream editor class

This PdfContentStreamEditor class parses the original content stream instruction by instruction keeping track of a part of the graphics state; the instructions are forwarded to its Write method which by default writes them back just as they come in, effectively creating an identical or at least equivalent copy of the original stream.

To actually edit the stream, simply override this Write method and only forward instructions you want in the result stream to the base Write method.

public class PdfContentStreamEditor : PdfContentStreamProcessor
{
    /**
     * This method edits the immediate contents of a page, i.e. its content stream.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void EditPage(PdfStamper pdfStamper, int pageNum)
    {
        PdfReader pdfReader = pdfStamper.Reader;
        PdfDictionary page = pdfReader.GetPageN(pageNum);
        byte[] pageContentInput = ContentByteUtils.GetContentBytesForPage(pdfReader, pageNum);
        page.Remove(PdfName.CONTENTS);
        EditContent(pageContentInput, page.GetAsDict(PdfName.RESOURCES), pdfStamper.GetUnderContent(pageNum));
    }

    /**
     * This method processes the content bytes and outputs to the given canvas.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void EditContent(byte[] contentBytes, PdfDictionary resources, PdfContentByte canvas)
    {
        this.canvas = canvas;
        ProcessContent(contentBytes, resources);
        this.canvas = null;
    }

    /**
     * This method writes content stream operations to the target canvas. The default
     * implementation writes them as they come, so it essentially generates identical
     * copies of the original instructions the {@link ContentOperatorWrapper} instances
     * forward to it.
     *
     * Override this method to achieve some fancy editing effect.
     */
    protected virtual void Write(PdfContentStreamProcessor processor, PdfLiteral operatorLit, List<PdfObject> operands)
    {
        int index = 0;

        foreach (PdfObject pdfObject in operands)
        {
            pdfObject.ToPdf(canvas.PdfWriter, canvas.InternalBuffer);
            canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '
');
        }
    }

    //
    // constructor giving the parent a dummy listener to talk to 
    //
    public PdfContentStreamEditor() : base(new DummyRenderListener())
    {
    }

    //
    // Overrides of PdfContentStreamProcessor methods
    //
    public override IContentOperator RegisterContentOperator(String operatorString, IContentOperator newOperator)
    {
        ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
        wrapper.setOriginalOperator(newOperator);
        IContentOperator formerOperator = base.RegisterContentOperator(operatorString, wrapper);
        return formerOperator is ContentOperatorWrapper ? ((ContentOperatorWrapper)formerOperator).getOriginalOperator() : formerOperator;
    }

    public override void ProcessContent(byte[] contentBytes, PdfDictionary resources)
    {
        this.resources = resources; 
        base.ProcessContent(contentBytes, resources);
        this.resources = null;
    }

    //
    // members holding the output canvas and the resources
    //
    protected PdfContentByte canvas = null;
    protected PdfDictionary resources = null;

    //
    // A content operator class to wrap all content operators to forward the invocation to the editor
    //
    class ContentOperatorWrapper : IContentOperator
    {
        public IContentOperator getOriginalOperator()
        {
            return originalOperator;
        }

        public void setOriginalOperator(IContentOperator originalOperator)
        {
            this.originalOperator = originalOperator;
        }

        public void Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List<PdfObject> operands)
        {
            if (originalOperator != null && !"Do".Equals(oper.ToString()))
            {
                originalOperator.Invoke(processor, oper, operands);
            }
            ((PdfContentStreamEditor)processor).Write(processor, oper, operands);
        }

        private IContentOperator originalOperator = null;
    }

    //
    // A dummy render listener to give to the underlying content stream processor to feed events to
    //
    class DummyRenderListener : IRenderListener
    {
        public void BeginTextBlock() { }

        public void RenderText(TextRenderInfo renderInfo) { }

        public void EndTextBlock() { }

        public void RenderImage(ImageRenderInfo renderInfo) { }
    }
}

Some backgrounds:

This class extends the PdfContentStreamProcessor from the iTextSharp parser namespace. This class originally is designed to merely parse content streams to return information for text, image, or graphics extraction. We make use of it to keep track of a part of the graphics state, more exactly those graphics state parameters relevant for text extraction.

If for specific editing tasks one also needs pre-processed information on e.g. the text drawn by the current instruction, one can use a custom IRenderListener implementation to retrieve that information instead of the DummyRenderListener used here which simply ignores it.

This class architecture is inspired by the PdfCleanUpProcessor from the iTextSharp.xtra extra library.

An editor to hide the OP's watermark

As the OP has already found out, his watermarks can be recognized as the only document parts using transparency defined in an ExtGState object as ca value. To hide the watermark we therefore have to

  • recognize graphics state changes with respect to that value and
  • not draw anything when the recognized current ca value is less than 1.

Actually the watermark is built using vector graphics operations. Thus, we can restrict our editing to those operations. We can even restrict it to change the final drawing instruction ("stroke" / "fill" / "fill-and-stroke" plus certain variations) to not do the part (filling or stroking) which generates transparent content.

public class TransparentGraphicsRemover : PdfContentStreamEditor
{
    protected override void Write(PdfContentStreamProcessor processor, PdfLiteral oper, List<PdfObject> operands)
    {
        String operatorString = oper.ToString();
        if ("gs".Equals(operatorString))
        {
            updateTransparencyFrom((PdfName) operands[0]);
        }

        if (operatorMapping.Keys.Contains(operatorString))
        {
            // Downgrade the drawing operator if transparency is involved
            // For details cf. the comment before the operatorMapping declaration
            PdfLiteral[] mapping = operatorMapping[operatorString];

            int index = 0;
            if (strokingAlpha < 1)
                index |= 1;
            if (nonStrokingAlpha < 1)
                index |= 2;

            oper = mapping[index];
            operands[operands.Count - 1] = oper;
        }

        base.Write(processor, oper, operands);
    }

    // The current transparency values; beware: save and restore state operations are ignored!
    float strokingAlpha = 1;
    float nonStrokingAlpha = 1;

    void updateTransparencyFrom(PdfName gsName)
    {
        PdfDictionary extGState = getGraphicsStateDictionary(gsName);
        if (extGState != null)
        {
            PdfNumber number = extGState.GetAsNumber(PdfName.ca);
            if (number != null)
                nonStrokingAlpha = number.FloatValue;
            number = extGState.GetAsNumber(PdfName.CA);
            if (number != null)
                strokingAlpha = number.FloatValue;
        }
    }

    PdfDictionary getGraphicsStateDictionary(PdfName gsName)
    {
        PdfDictionary extGStates = resources.GetAsDict(PdfName.EXTGSTATE);
        return extGStates.GetAsDict(gsName);
    }

    //
    // Map from an operator name to an array of operations it becomes depending
    // on the current graphics state:
    //
    // * [0] the operation in case of no transparency
    // * [1] the operation in case of stroking transparency
    // * [2] the operation in case of non-stroking transparency
    // * [3] the operation in case of stroking and non-stroking transparency
    //
    Dictionary<String, PdfLiteral[]> operatorMapping = new Dictionary<String, PdfLiteral[]>();

    public TransparentGraphicsRemover()
    {
        PdfLiteral _S = new PdfLiteral("S");
        PdfLiteral _s = new PdfLiteral("s");
        PdfLiteral _f = new PdfLiteral("f");
        PdfLiteral _fStar = new PdfLiteral("f*");
        PdfLiteral _B = new PdfLiteral("B");
        PdfLiteral _BStar = new PdfLiteral("B*");
        PdfLiteral _b = new PdfLiteral("b");
        PdfLiteral _bStar = new PdfLiteral("b*");
        PdfLiteral _n = new PdfLiteral("n");

        operatorMapping["S"] = new PdfLiteral[]{ _S, _n, _S, _n };
        operatorMapping["s"] = new PdfLiteral[]{ _s, _n, _s, _n };
        operatorMapping["f"] = new PdfLiteral[]{ _f, _f, _n, _n };
        operatorMapping["F"] = new PdfLiteral[]{ _f, _f, _n, _n };
        operatorMapping["f*"] = new PdfLiteral[]{ _fStar, _fStar, _n, _n };
        operatorMapping["B"] = new PdfLiteral[]{ _B, _f, _S, _n };
        operatorMapping["B*"] = new PdfLiteral[]{ _BStar, _fStar, _S, _n };
        operatorMapping["b"] = new PdfLiteral[] { _b, _f, _s, _n };
        operatorMapping["b*"] = new PdfLiteral[]{ _bStar, _fStar, _s, _n };
    }
}

Beware: This sample editor is very simple:

  • It only considers transparency created by the ExtGState parameters ca and CA, it in particular ignores masks.
  • It does not look for operations saving or restoring the graphics state.

These limitations can easily be lifted but require more code than appropriate for a stackoverflow answer.

Applying this editor to the OP's sample file like this

string source = @"test3.pdf";
string dest = @"test3-noTransparency.pdf";

using (PdfReader pdfReader = new PdfReader(source))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write)))
{
    PdfContentStreamEditor editor = new TransparentGraphicsRemover();

    for (int i = 1; i <= pdfReader.NumberOfPages; i++)
    {
        editor.EditPage(pdfStamper, i);
    }
}

results in a PDF file without the watermark.

I don't have the tools the OP exported the contents to word with, NitroPDF and Foxit, so I could not execute a final test. Adobe Acrobat (version 9.5) at least upon export to Word does not include the watermark .

If the OP's tools still have traces of the watermark in the exported Word files, one can easily improve this class to actually drop path creation and drawing operations while transparency is active.

The same in Java

I started implementing this for iText in Java and only later realized the OP had iTextSharp in .Net on his mind. Here are the equivalent Java classes:

public class PdfContentStreamEditor extends PdfContentStreamProcessor
{
    /**
     * This method edits the immediate contents of a page, i.e. its content stream.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void editPage(PdfStamper pdfStamper, int pageNum) throws IOException
    {
        PdfReader pdfReader = pdfStamper.getReader();
        PdfDictionary page = pdfReader.getPageN(pageNum);
        byte[] pageContentInput = ContentByteUtils.getContentBytesForPage(pdfReader, pageNum);
        page.remove(PdfName.CONTENTS);
        editContent(pageContentInput, page.getAsDict(PdfName.RESOURCES), pdfStamper.getUnderContent(pageNum));
    }

    /**
     * This method processes the content bytes and outputs to the given canvas.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void editContent(byte[] contentBytes, PdfDictionary resources, PdfContentByte canvas)
    {
        this.canvas = canvas;
        processContent(contentBytes, resources);
        this.canvas = null;
    }

    /**
     * <p>
     * This method writes content stream operations to the target canvas. The default
     * implementation writes them as they come, so it essentially generates identical
     * copies of the original instructions the {@link ContentOperatorWrapper} instances
     * forward to it.
     * </p>
     * <p>
     * Override this method to achieve some fancy editing effect.
     * </p> 
     */
    protected void write(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException
    {
        int index = 0;

        for (PdfObject object : operands)
        {
            object.toPdf(canvas.getPdfWriter(), canvas.getInternalBuffer());
            canvas.getInternalBuffer().append(operands.size() > ++index ? (byte) ' ' : (byte) '
');
        }
    }

    //
    // constructor giving the parent a dummy listener to talk to 
    //
    public PdfContentStreamEditor()
    {
        super(new DummyRenderListener());
    }

    //
    // Overrides of PdfContentStreamProcessor methods
    //
    @Override
    public ContentOperator registerContentOperator(String operatorString, ContentOperator operator)
    {
        ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
        wrapper.setOriginalOperator(operator);
        ContentOperator formerOperator = super.registerContentOperator(operatorString, wrapper);
        return formerOperator instanceof ContentOperatorWrapper ? ((ContentOperatorWrapper)formerOperator).getOriginalOperator() : formerOperator;
    }

    @Override
    public void processContent(byte[] contentBytes, PdfDictionary resources)
    {
        this.resources = resources; 
        super.processContent(contentBytes, resources);
        this.resources = null;
    }

    //
    // members holding the output canvas and the resources
    //
    protected PdfContentByte canvas = null;
    protected PdfDictionary resources = null;

    //
    // A content operator class to wrap all content operators to forward the invocation to the editor
    //
    class ContentOperatorWrapper implements ContentOperator
    {
        public ContentOperator getOriginalOperator()
        {
            return originalOperator;
        }

        public void setOriginalOperator(ContentOperator originalOperator)
        {
            this.originalOperator = originalOperator;
        }

        @Override
        public void invoke(PdfContentStreamProcessor processor, PdfLiteral operator, ArrayList<PdfObject> operands) throws Exception
        {
            if (originalOperator != null && !"Do".equals(operator.toString()))
            {
                originalOperator.invoke(processor, operator, operands);
            }
            write(processor, operator, operands);
        }

        private ContentOperator originalOperator = null;
    }

    //
    // A dummy render listener to give to the underlying content stream processor to feed events to
    //
    static class DummyRenderListener implements RenderListener
    {
        @Override
        public void beginTextBlock() { }

        @Override
        public void renderText(TextRenderInfo renderInfo) { }

        @Override
        public void endTextBlock() { }

        @Override
        public void renderImage(ImageRenderInfo renderInfo) { }
    }
}

(PdfContentStreamEditor.java)

public class TransparentGraphicsRemover extends PdfContentStreamEditor
{
    @Override
    protected void write(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException
    {
        String operatorString = operator.toString();
        if ("gs".equals(operatorString))
        {
            updateTransparencyFrom((PdfName) operands.get(0));
        }

        PdfLiteral[] mapping = operatorMapping.get(operatorString);

        if (mapping != null)
        {
            int index = 0;
            if (strokingAlpha < 1)
                index |= 1;
            if (nonStrokingAlpha < 1)
                index |= 2;

            operator = mapping[index];
            operands.set(operands.size() - 1, operator);
        }

        super.write(processor, operator, operands);
    }

    // The current transparency values; beware: save and restore state operations are ignored!
    float strokingAlpha = 1;
    float nonStrokingAlpha = 1;

    void updateTransparencyFrom(PdfName gsName)
    {
        PdfDictionary extGState = getGraphicsStateDictionary(gsName);
        if (extGState != null)
        {
            PdfNumber number = extGState.getAsNumber(PdfName.ca);
            if (number != null)
                nonStrokingAlpha = number.floatValue();
            number = extGState.getAsNumber(PdfName.CA);
            if (number != null)
                strokingAlpha = number.floatValue();
        }
    }

    PdfDictionary getGraphicsStateDictionary(PdfName gsName)
    {
        PdfDictionary extGStates = resources.getAsDict(PdfName.EXTGSTATE);
        return extGStates.getAsDict(gsName);
    }

    //
    // Map from an operator name to an array of operations it becomes depending
    // on the current graphics state:
    //
    // * [0] the operation in case of no transparency
    // * [1] the operation in case of stroking transparency
    // * [2] the operation in case of non-stroking transparency
    // * [3] the operation in case of stroking and non-stroking transparency
    //
    static Map<String, PdfLiteral[]> operatorMapping = new HashMap<String, PdfLiteral[]>();
    static
    {
        PdfLiteral _S = new PdfLiteral("S");
        PdfLiteral _s = new PdfLiteral("s");
        PdfLiteral _f = new PdfLiteral("f");
        PdfLiteral _fStar = new PdfLiteral("f*");
        PdfLiteral _B = new PdfLiteral("B");
        PdfLiteral _BStar = new PdfLiteral("B*");
        PdfLiteral _b = new PdfLiteral("b");
        PdfLiteral _bStar = new PdfLiteral("b*");
        PdfLiteral _n = new PdfLiteral("n");

        operatorMapping.put("S", new PdfLiteral[]{ _S, _n, _S, _n });
        operatorMapping.put("s", new PdfLiteral[]{ _s, _n, _s, _n });
        operatorMapping.put("f", new PdfLiteral[]{ _f, _f, _n, _n });
        operatorMapping.put("F", new PdfLiteral[]{ _f, _f, _n, _n });
        operatorMapping.put("f*", new PdfLiteral[]{ _fStar, _fStar, _n, _n });
        operatorMapping.put("B", new PdfLiteral[]{ _B, _f, _S, _n });
        operatorMapping.put("B*", new PdfLiteral[]{ _BStar, _fStar, _S, _n });
        operatorMapping.put("b", new PdfLiteral[]{ _b, _f, _s, _n });
        operatorMapping.put("b*", new PdfLiteral[]{ _bStar, _fStar, _s, _n });
    }
}

(TransparentGraphicsRemover.java)

@Test
public void testRemoveTransparentGraphicsTest3() throws IOException, DocumentException
{
    try (   InputStream resource = getClass().getResourceAsStream("test3.pdf");
            OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "test3-noTransparency.pdf")))
    {
        PdfReader pdfReader = new PdfReader(resource);
        PdfStamper pdfStamper = new PdfStamper(pdfReader, result);
        PdfContentStreamEditor editor = new TransparentGraphicsRemover();

        for (int i = 1; i <= pdfReader.getNumberOfPages(); i++)
        {
            editor.editPage(pdfStamper, i);
        }

        pdfStamper.close();
    }
}

(excerpt from EditPageContent.java)

这篇关于从 PDF iTextSharp 中去除水印的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆