从PDF iTextSharp中删除水印 [英] Removing Watermark from PDF iTextSharp

查看:590
本文介绍了从PDF iTextSharp中删除水印的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经完成了建议这里的解决方案,但我的问题有点不同。
在上述链接提供的解决方案中,只有使用iTextSharp添加水印时才能删除水印。就我而言,我在某些情况下使用Microsoft Word添加水印。当我使用以下代码时,水印确实从PDF中消失,但是当我将PDF转换为Word时,它的水印再次显示为图像。根据我的理解,下面的代码是它将水印的不透明度值更改为0,因此它会消失。

  private static void removeWatermark(string watermarkedFile,string unwatermarkedFile)
{
PdfReader.unethicalreading = true;
PdfReader reader = new PdfReader(watermarkedFile);
reader.RemoveUnusedObjects();
int pageCount = reader.NumberOfPages;
for(int i = 1; i< = pageCount; i ++)
{
var page = reader.GetPageN(i);
PdfDictionary resources = page.GetAsDict(PdfName.RESOURCES);
PdfDictionary extGStates = resources.GetAsDict(PdfName.EXTGSTATE);
如果(extGStates == null)
继续;

foreach(extGStates.Keys中的PdfName名称)
{
var obj = extGStates.Get(name);
PdfDictionary extGStateObject =(PdfDictionary)PdfReader.GetPdfObject(obj);
var stateNumber = extGStateObject.Get(PdfName.ca);
if(stateNumber == null)
继续;

var caNumber =(PdfNumber)PdfReader.GetPdfObject(stateNumber);
if(caNumber.FloatValue!= 1f)
{
extGStateObject.Remove(PdfName.ca);

extGStateObject.Put(PdfName.ca,new PdfNumber(0f));
}
}
}

使用(FileStream fs = new FileStream(unwatermarkedFile,FileMode.Create,FileAccess.Write,FileShare.None))
{
using(PdfStamper stamper = new PdfStamper(reader,fs))
{
stamper.SetFullCompression();
stamper.Close();
}
}
}

有没有办法成为能够通过修改代码来删除这个水印吗?

解决方案

正如OP已经提到的,如果你完全控制了这个过程创建水印,你可以像@ChrefHaas在他的回答中解释 OP提到的问题



如果在另一方面,您创建水印的工具以自己的方式完成,您将需要为这些水印定制的方法。



此方法通常需要您编辑一些水印内容流。顺便说一句,@ ChrisHaas的解决方案也是如此。



为了使这更容易,首先应该创建一个通用的内容流编辑功能,然后才使用它编辑这些水印的功能。



因此,首先是一个示例通用内容流编辑器类,然后是基于此的解决方案来编辑​​OP的样本水印。



通用内容流编辑器类



PdfContentStreamEditor 类通过指令跟踪原始内容流指令,跟踪图形状态的一部分;指令被转发到 Write 方法,该方法默认将它们写回来,有效地创建原始流的相同或至少等效的副本。



要实际编辑流,只需覆盖此写入方法,并仅转发所需的指令结果流到基地方法。

 公共类PdfContentStreamEditor: PdfContentStreamProcessor 
{
/ **
*此方法编辑页面的直接内容,即其内容流。
*它明确地不会下降到形式xobjects,模式或注释。
* /
public void EditPage(PdfStamper pdfStamper,int pageNum)
{
PdfReader pdfReader = pdfStamper.Reader;
PdfDictionary page = pdfReader.GetPageN(pageNum);
byte [] pageContentInput = ContentByteUtils.GetContentBytesForPage(pdfReader,pageNum);
page.Remove(PdfName.CONTENTS);
EditContent(pageContentInput,page.GetAsDict(PdfName.RESOURCES),pdfStamper.GetUnderContent(pageNum));
}

/ **
*此方法处理内容字节并输出到给定画布。
*它明确地不会下降到形式xobjects,模式或注释。
* /
public void EditContent(byte [] contentBytes,PdfDictionary resources,PdfContentByte canvas)
{
this.canvas = canvas;
ProcessContent(contentBytes,resources);
this.canvas = null;
}

/ **
*此方法将内容流操作写入目标画布。默认的
*实现在它们到来时写入它们,因此它基本上生成相同的
*原始指令的副本{@link ContentOperatorWrapper}实例
*转发给它。
*
*重写此方法以实现一些奇特的编辑效果。
* /
protected virtual void Write(PdfContentStreamProcessor processor,PdfLiteral operatorLit,List< PdfObject> operands)
{
int index = 0;

foreach(操作数中的PdfObject pdfObject)
{
pdfObject.ToPdf(canvas.PdfWriter,canvas.InternalBuffer);
canvas.InternalBuffer.Append(operands.Count> ++ index?(byte)'':( byte)'\ n');
}
}

//
//构造函数给父母一个虚拟监听器与
//
公共PdfContentStreamEditor()交谈:base(new DummyRenderListener())
{
}

//
//覆盖PdfContentStreamProcessor方法
//
公共覆盖IContentOperator RegisterContentOperator(String operatorString,IContentOperator newOperator)
{
ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
wrapper.setOriginalOperator(newOperator);
IContentOperator formerOperator = base.RegisterContentOperator(operatorString,wrapper);
返回formerOperator是ContentOperatorWrapper吗? ((ContentOperatorWrapper)formerOperator).getOriginalOperator():formerOperator;
}

public override void ProcessContent(byte [] contentBytes,PdfDictionary resources)
{
this.resources = resources;
base.ProcessContent(contentBytes,resources);
this.resources = null;
}

//
//持有输出画布和资源的成员
//
protected PdfContentByte canvas = null;
protected PdfDictionary resources = null;

//
//用于包装所有内容运算符以将调用转发给编辑器的内容运算符类
//
类ContentOperatorWrapper:IContentOperator
{
public IContentOperator getOriginalOperator()
{
return originalOperator;
}

public void setOriginalOperator(IContentOperator originalOperator)
{
this.originalOperator = originalOperator;
}

public void Invoke(PdfContentStreamProcessor处理器,PdfLiteral oper,List< PdfObject>操作数)
{
if(originalOperator!= null&&!做.Equals(oper.ToString()))
{
originalOperator.Invoke(processor,oper,operands);
}
((PdfContentStreamEditor)处理器).Write(处理器,操作符,操作数);
}

private IContentOperator originalOperator = null;
}

//
//一个虚拟渲染器监听器,用于将基础内容流处理器提供给
//
类DummyRenderListener: IRenderListener
{
public void BeginTextBlock(){}

public void RenderText(TextRenderInfo renderInfo){}

public void EndTextBlock(){}

public void RenderImage(ImageRenderInfo renderInfo){}
}
}

某些背景



此类扩展来自iTextSharp解析器命名空间的PdfContentStreamProcessor 。该类最初旨在仅解析内容流以返回文本,图像或图形提取的信息。我们利用它来跟踪图形状态的一部分,更准确地说是与文本提取相关的图形状态参数。



If对于特定的编辑任务,还需要例如预处理的信息当前指令绘制的文本,可以使用自定义 IRenderListener 实现来检索该信息,而不是使用 DummyRenderListener 这里只是忽略它。



这个类体系结构的灵感来自 PdfCleanUpProcessor 来自 iTextSharp.xtra 额外库。



隐藏OP水印的编辑器



正如OP已经发现的那样,他的水印可以被识别为使用 ExtGState 对象中定义的透明度作为 ca 值的唯一文档部分。为了隐藏水印,我们必须




  • 识别关于该值的图形状态变化和

  • 当识别出的当前 ca 值小于1时,不绘制任何内容。



实际上水印已构建使用矢量图形操作。因此,我们可以将编辑限制为这些操作。我们甚至可以限制它来改变最终的绘图指令(笔画/填充/填充和描边加上某些变化),而不是生成透明内容的部分(填充或抚摸)。

  public class TransparentGraphicsRemover:PdfContentStreamEditor 
{
protected override void Write(PdfContentStreamProcessor processor,PdfLiteral oper,List< PdfObject> operands )
{
String operatorString = oper.ToString();
if(gs.Equals(operatorString))
{
updateTransparencyFrom((PdfName)operands [0]);
}

if(operatorMapping.Keys.Contains(operatorString))
{
//如果涉及透明度,则降级绘图运算符
//对于细节参见operatorMapping声明前的注释
PdfLiteral [] mapping = operatorMapping [operatorString];

int index = 0;
if(strokingAlpha< 1)
index | = 1;
if(nonStrokingAlpha< 1)
index | = 2;

oper = mapping [index];
operands [operands.Count - 1] = oper;
}

base.Write(处理器,操作,操作数);
}

//当前透明度值;注意:保存和恢复状态操作被忽略!
float strokingAlpha = 1;
float nonStrokingAlpha = 1;

void updateTransparencyFrom(PdfName gsName)
{
PdfDictionary extGState = getGraphicsStateDictionary(gsName);
if(extGState!= null)
{
PdfNumber number = extGState.GetAsNumber(PdfName.ca);
if(number!= null)
nonStrokingAlpha = number.FloatValue;
number = extGState.GetAsNumber(PdfName.CA);
if(number!= null)
strokingAlpha = number.FloatValue;
}
}

PdfDictionary getGraphicsStateDictionary(PdfName gsName)
{
PdfDictionary extGStates = resources.GetAsDict(PdfName.EXTGSTATE);
返回extGStates.GetAsDict(gsName);
}

//
//从运营商名称映射到一个操作数组,它在当前图形状态下依赖于
//:
//
// * [0]无透明的情况下的操作
// * [1]在抚摸透明度的情况下的操作
// * [2]操作以防万一非抚摸透明度
// * [3]在抚摸和非抚摸透明的情况下的操作
//
字典< String,PdfLiteral []> operatorMapping = new Dictionary< String,PdfLiteral []>();

public TransparentGraphicsRemover()
{
PdfLiteral _S = new PdfLiteral(S);
PdfLiteral _s = new PdfLiteral(s);
PdfLiteral _f = new PdfLiteral(f);
PdfLiteral _fStar = new PdfLiteral(f *);
PdfLiteral _B = new PdfLiteral(B);
PdfLiteral _BStar = new PdfLiteral(B *);
PdfLiteral _b = new PdfLiteral(b);
PdfLiteral _bStar = new PdfLiteral(b *);
PdfLiteral _n = new PdfLiteral(n);

operatorMapping [S] = new PdfLiteral [] {_ S,_n,_S,_n};
operatorMapping [s] = new PdfLiteral [] {_ s,_n,_s,_n};
operatorMapping [f] = new PdfLiteral [] {_ f,_f,_n,_n};
operatorMapping [F] = new PdfLiteral [] {_ f,_f,_n,_n};
operatorMapping [f *] = new PdfLiteral [] {_fStar,_fStar,_n,_n};
operatorMapping [B] = new PdfLiteral [] {_ B,_f,_S,_n};
operatorMapping [B *] = new PdfLiteral [] {_ BStar,_fStar,_S,_n};
operatorMapping [b] = new PdfLiteral [] {_ b,_f,_s,_n};
operatorMapping [b *] = new PdfLiteral [] {_bStar,_fStar,_s,_n};
}
}

小心:此示例编辑器非常简单:




  • 它只考虑 ExtGState创建的透明度参数 ca CA ,尤其会忽略蒙版。

  • 它不会查找保存或恢复图形状态的操作。



这些限制很容易解除,但需要的代码多于适当的对于stackoverflow答案。



将此编辑器应用于OP的示例文件,如下所示

  string source = @test3.pdf; 
string dest = @test3-noTransparency.pdf;

使用(PdfReader pdfReader = new PdfReader(source))
using(PdfStamper pdfStamper = new PdfStamper(pdfReader,new FileStream(dest,FileMode.Create,FileAccess.Write)))
{
PdfContentStreamEditor editor = new TransparentGraphicsRemover();

for(int i = 1; i< = pdfReader.NumberOfPages; i ++)
{
editor.EditPage(pdfStamper,i);
}
}

会生成没有水印的PDF文件。



我没有OP将内容输出到 NitroPDF和Foxit 的工具,所以我无法执行最终测试。 Adobe Acrobat(版本9.5)至少在导出到Word时不包含水印。



如果OP的工具在导出的Word文件中仍有水印痕迹,当透明度处于活动状态时,可以轻松改进此类以实际删除路径创建和绘制操作。



Java中相同



<我开始在Java中用iText实现这个功能,后来才意识到OP在他脑子里有.net中的iTextSharp。以下是等效的Java类:

  public class PdfContentStreamEditor extends PdfContentStreamProcessor 
{
/ **
*此方法编辑页面的直接内容,即其内容流。
*它明确地不会下降到形式xobjects,模式或注释。
* /
public void editPage(PdfStamper pdfStamper,int pageNum)throws IOException
{
PdfReader pdfReader = pdfStamper.getReader();
PdfDictionary page = pdfReader.getPageN(pageNum);
byte [] pageContentInput = ContentByteUtils.getContentBytesForPage(pdfReader,pageNum);
page.remove(PdfName.CONTENTS);
editContent(pageContentInput,page.getAsDict(PdfName.RESOURCES),pdfStamper.getUnderContent(pageNum));
}

/ **
*此方法处理内容字节并输出到给定画布。
*它明确地不会下降到形式xobjects,模式或注释。
* /
public void editContent(byte [] contentBytes,PdfDictionary resources,PdfContentByte canvas)
{
this.canvas = canvas;
processContent(contentBytes,resources);
this.canvas = null;
}

/ **
*< p>
*此方法将内容流操作写入目标画布。默认的
*实现在它们到来时写入它们,因此它基本上生成相同的
*原始指令的副本{@link ContentOperatorWrapper}实例
*转发给它。
*< / p>
*< p>
*重写此方法以实现一些奇特的编辑效果。
*< / p>
* /
protected void write(PdfContentStreamProcessor处理器,PdfLiteral运算符,List< PdfObject>操作数)抛出IOException
{
int index = 0;

for(PdfObject object:operands)
{
object.toPdf(canvas.getPdfWriter(),canvas.getInternalBuffer());
canvas.getInternalBuffer()。append(operands.size()> ++ index?(byte)'':( byte)'\ n');
}
}

//
//构造函数给父母一个虚拟监听器与
//
公共PdfContentStreamEditor()交谈
{
super(new DummyRenderListener());
}

//
//覆盖PdfContentStreamProcessor方法
//
@Override
public ContentOperator registerContentOperator(String operatorString,ContentOperator operator) )
{
ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
wrapper.setOriginalOperator(operator);
ContentOperator formerOperator = super.registerContentOperator(operatorString,wrapper);
返回previousOperator instanceof ContentOperatorWrapper? ((ContentOperatorWrapper)formerOperator).getOriginalOperator():formerOperator;
}

@Override
public void processContent(byte [] contentBytes,PdfDictionary resources)
{
this.resources = resources;
super.processContent(contentBytes,resources);
this.resources = null;
}

//
//持有输出画布和资源的成员
//
protected PdfContentByte canvas = null;
protected PdfDictionary resources = null;

//
//用于包装所有内容运算符以将调用转发给编辑器的内容运算符类
//
类ContentOperatorWrapper实现ContentOperator
{
public ContentOperator getOriginalOperator()
{
return originalOperator;
}

public void setOriginalOperator(ContentOperator originalOperator)
{
this.originalOperator = originalOperator;
}

@Override
public void invoke(PdfContentStreamProcessor processor,PdfLiteral operator,ArrayList< PdfObject> operands)抛出异常
{
if(originalOperator! = null&&!Do.equals(operator.toString()))
{
originalOperator.invoke(processor,operator,operands);
}
写(处理器,运算符,操作数);
}

private ContentOperator originalOperator = null;
}

//
//一个虚拟渲染器监听器,用于将基础内容流处理器提供给
//
静态类DummyRenderListener实现RenderListener
{
@Override
public void beginTextBlock(){}

@Override
public void renderText(TextRenderInfo renderInfo){}

@Override
public void endTextBlock(){}

@Override
public void renderImage(ImageRenderInfo renderInfo){}
}
}

PdfContentStreamEditor.java

  public class TransparentGraphicsRemover extends PdfContentStreamEditor 
{
@Override
protected void write(PdfContentStreamProcessor)处理器,PdfLiteral运算符,List< PdfObject>操作数)抛出IOException
{
String operatorString = operator.toString();
if(gs.equals(operatorString))
{
updateTransparencyFrom((PdfName)operands.get(0));
}

PdfLiteral [] mapping = operatorMapping.get(operatorString);

if(mapping!= null)
{
int index = 0;
if(strokingAlpha< 1)
index | = 1;
if(nonStrokingAlpha< 1)
index | = 2;

operator = mapping [index];
operands.set(operands.size() - 1,operator);
}

super.write(处理器,运算符,操作数);
}

//当前透明度值;注意:保存和恢复状态操作被忽略!
float strokingAlpha = 1;
float nonStrokingAlpha = 1;

void updateTransparencyFrom(PdfName gsName)
{
PdfDictionary extGState = getGraphicsStateDictionary(gsName);
if(extGState!= null)
{
PdfNumber number = extGState.getAsNumber(PdfName.ca);
if(number!= null)
nonStrokingAlpha = number.floatValue();
number = extGState.getAsNumber(PdfName.CA);
if(number!= null)
strokingAlpha = number.floatValue();
}
}

PdfDictionary getGraphicsStateDictionary(PdfName gsName)
{
PdfDictionary extGStates = resources.getAsDict(PdfName.EXTGSTATE);
返回extGStates.getAsDict(gsName);
}

//
//从运营商名称映射到一个操作数组,它在当前图形状态下依赖于
//:
//
// * [0]无透明的情况下的操作
// * [1]在抚摸透明度的情况下的操作
// * [2]操作以防万一非抚摸透明度
// * [3]在抚摸和非抚摸透明的情况下的操作
//
static Map< String,PdfLiteral []> operatorMapping = new HashMap< String,PdfLiteral []>();
static
{
PdfLiteral _S = new PdfLiteral(S);
PdfLiteral _s = new PdfLiteral(s);
PdfLiteral _f = new PdfLiteral(f);
PdfLiteral _fStar = new PdfLiteral(f *);
PdfLiteral _B = new PdfLiteral(B);
PdfLiteral _BStar = new PdfLiteral(B *);
PdfLiteral _b = new PdfLiteral(b);
PdfLiteral _bStar = new PdfLiteral(b *);
PdfLiteral _n = new PdfLiteral(n);

operatorMapping.put(S,new PdfLiteral [] {_ S,_n,_S,_n});
operatorMapping.put(s,new PdfLiteral [] {_ s,_n,_s,_n});
operatorMapping.put(f,new PdfLiteral [] {_ f,_f,_n,_n});
operatorMapping.put(F,new PdfLiteral [] {_ f,_f,_n,_n});
operatorMapping.put(f *,new PdfLiteral [] {_ fStar,_fStar,_n,_n});
operatorMapping.put(B,new PdfLiteral [] {_ B,_f,_S,_n});
operatorMapping.put(B *,new PdfLiteral [] {_ BStar,_fStar,_S,_n});
operatorMapping.put(b,new PdfLiteral [] {_ b,_f,_s,_n});
operatorMapping.put(b *,new PdfLiteral [] {_ bStar,_fStar,_s,_n});
}
}

TransparentGraphicsRemover.java

  @Test 
public void testRemoveTransparentGraphicsTest3()抛出IOException,DocumentException
{
try(InputStream resource = getClass()。getResourceAsStream(test3.pdf);
OutputStream result = new FileOutputStream(new File(RESULT_FOLDER,test3-noTransparency.pdf)))
{
PdfReader pdfReader = new PdfReader(resource);
PdfStamper pdfStamper = new PdfStamper(pdfReader,result);
PdfContentStreamEditor editor = new TransparentGraphicsRemover();

for(int i = 1; i< = pdfReader.getNumberOfPages(); i ++)
{
editor.editPage(pdfStamper,i);
}

pdfStamper.close();
}
}

(摘自 EditPageContent.java


I have gone through the solution suggested here but my problem is a little different. In the solution provided at the above link, one can remove the watermark only if iTextSharp is used to add the watermark as well. In my case, I am adding a watermark in some cases using Microsoft Word. When I use the following code, the watermark does disappear from the PDF but when I convert the PDF to Word, it watermark appears again as an image. As per my understanding, what the code below does is that it changes the opacity value of the watermark to 0 and therefore it disappears.

private static void removeWatermark(string watermarkedFile, string unwatermarkedFile)
{
    PdfReader.unethicalreading = true;
    PdfReader reader = new PdfReader(watermarkedFile);
    reader.RemoveUnusedObjects();
    int pageCount = reader.NumberOfPages;
    for (int i = 1; i <= pageCount; i++)
    {
        var page = reader.GetPageN(i);
        PdfDictionary resources = page.GetAsDict(PdfName.RESOURCES);
        PdfDictionary extGStates = resources.GetAsDict(PdfName.EXTGSTATE);
        if (extGStates == null)
            continue;

        foreach (PdfName name in extGStates.Keys)
        {
            var obj = extGStates.Get(name);
            PdfDictionary extGStateObject = (PdfDictionary)PdfReader.GetPdfObject(obj);
            var stateNumber = extGStateObject.Get(PdfName.ca);
            if (stateNumber == null)
                continue;

            var caNumber = (PdfNumber)PdfReader.GetPdfObject(stateNumber);
            if (caNumber.FloatValue != 1f)
            {
                extGStateObject.Remove(PdfName.ca);

                extGStateObject.Put(PdfName.ca, new PdfNumber(0f));
            }
        }
    }

    using (FileStream fs = new FileStream(unwatermarkedFile, FileMode.Create, FileAccess.Write, FileShare.None))
    {
        using (PdfStamper stamper = new PdfStamper(reader, fs))
        {
            stamper.SetFullCompression();
            stamper.Close();
        }
    }
}

Is there a way to be able to delete this watermark by modifying the code?

解决方案

As the OP already mentioned, if you have complete control over the process originally creating the watermark, you can do as @ChrisHaas explained in his answer to the question the OP referred to.

If on the other hand the tool you create the watermark with does so in its own way, you will need a method customized for those watermarks.

This method usually will require that you edit some content stream. @ChrisHaas' solution, by the way, does so, too.

To make this easier, one should start by creating a generic content stream editing functionality and then only use this functionality to edit out those watermarks.

Thus, here at first a sample generic content stream editor class and then a solution based thereon to edit out the OP's sample watermark.

A generic content stream editor class

This PdfContentStreamEditor class parses the original content stream instruction by instruction keeping track of a part of the graphics state; the instructions are forwarded to its Write method which by default writes them back just as they come in, effectively creating an identical or at least equivalent copy of the original stream.

To actually edit the stream, simply override this Write method and only forward instructions you want in the result stream to the base Write method.

public class PdfContentStreamEditor : PdfContentStreamProcessor
{
    /**
     * This method edits the immediate contents of a page, i.e. its content stream.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void EditPage(PdfStamper pdfStamper, int pageNum)
    {
        PdfReader pdfReader = pdfStamper.Reader;
        PdfDictionary page = pdfReader.GetPageN(pageNum);
        byte[] pageContentInput = ContentByteUtils.GetContentBytesForPage(pdfReader, pageNum);
        page.Remove(PdfName.CONTENTS);
        EditContent(pageContentInput, page.GetAsDict(PdfName.RESOURCES), pdfStamper.GetUnderContent(pageNum));
    }

    /**
     * This method processes the content bytes and outputs to the given canvas.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void EditContent(byte[] contentBytes, PdfDictionary resources, PdfContentByte canvas)
    {
        this.canvas = canvas;
        ProcessContent(contentBytes, resources);
        this.canvas = null;
    }

    /**
     * This method writes content stream operations to the target canvas. The default
     * implementation writes them as they come, so it essentially generates identical
     * copies of the original instructions the {@link ContentOperatorWrapper} instances
     * forward to it.
     *
     * Override this method to achieve some fancy editing effect.
     */
    protected virtual void Write(PdfContentStreamProcessor processor, PdfLiteral operatorLit, List<PdfObject> operands)
    {
        int index = 0;

        foreach (PdfObject pdfObject in operands)
        {
            pdfObject.ToPdf(canvas.PdfWriter, canvas.InternalBuffer);
            canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
        }
    }

    //
    // constructor giving the parent a dummy listener to talk to 
    //
    public PdfContentStreamEditor() : base(new DummyRenderListener())
    {
    }

    //
    // Overrides of PdfContentStreamProcessor methods
    //
    public override IContentOperator RegisterContentOperator(String operatorString, IContentOperator newOperator)
    {
        ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
        wrapper.setOriginalOperator(newOperator);
        IContentOperator formerOperator = base.RegisterContentOperator(operatorString, wrapper);
        return formerOperator is ContentOperatorWrapper ? ((ContentOperatorWrapper)formerOperator).getOriginalOperator() : formerOperator;
    }

    public override void ProcessContent(byte[] contentBytes, PdfDictionary resources)
    {
        this.resources = resources; 
        base.ProcessContent(contentBytes, resources);
        this.resources = null;
    }

    //
    // members holding the output canvas and the resources
    //
    protected PdfContentByte canvas = null;
    protected PdfDictionary resources = null;

    //
    // A content operator class to wrap all content operators to forward the invocation to the editor
    //
    class ContentOperatorWrapper : IContentOperator
    {
        public IContentOperator getOriginalOperator()
        {
            return originalOperator;
        }

        public void setOriginalOperator(IContentOperator originalOperator)
        {
            this.originalOperator = originalOperator;
        }

        public void Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List<PdfObject> operands)
        {
            if (originalOperator != null && !"Do".Equals(oper.ToString()))
            {
                originalOperator.Invoke(processor, oper, operands);
            }
            ((PdfContentStreamEditor)processor).Write(processor, oper, operands);
        }

        private IContentOperator originalOperator = null;
    }

    //
    // A dummy render listener to give to the underlying content stream processor to feed events to
    //
    class DummyRenderListener : IRenderListener
    {
        public void BeginTextBlock() { }

        public void RenderText(TextRenderInfo renderInfo) { }

        public void EndTextBlock() { }

        public void RenderImage(ImageRenderInfo renderInfo) { }
    }
}

Some backgrounds:

This class extends the PdfContentStreamProcessor from the iTextSharp parser namespace. This class originally is designed to merely parse content streams to return information for text, image, or graphics extraction. We make use of it to keep track of a part of the graphics state, more exactly those graphics state parameters relevant for text extraction.

If for specific editing tasks one also needs pre-processed information on e.g. the text drawn by the current instruction, one can use a custom IRenderListener implementation to retrieve that information instead of the DummyRenderListener used here which simply ignores it.

This class architecture is inspired by the PdfCleanUpProcessor from the iTextSharp.xtra extra library.

An editor to hide the OP's watermark

As the OP has already found out, his watermarks can be recognized as the only document parts using transparency defined in an ExtGState object as ca value. To hide the watermark we therefore have to

  • recognize graphics state changes with respect to that value and
  • not draw anything when the recognized current ca value is less than 1.

Actually the watermark is built using vector graphics operations. Thus, we can restrict our editing to those operations. We can even restrict it to change the final drawing instruction ("stroke" / "fill" / "fill-and-stroke" plus certain variations) to not do the part (filling or stroking) which generates transparent content.

public class TransparentGraphicsRemover : PdfContentStreamEditor
{
    protected override void Write(PdfContentStreamProcessor processor, PdfLiteral oper, List<PdfObject> operands)
    {
        String operatorString = oper.ToString();
        if ("gs".Equals(operatorString))
        {
            updateTransparencyFrom((PdfName) operands[0]);
        }

        if (operatorMapping.Keys.Contains(operatorString))
        {
            // Downgrade the drawing operator if transparency is involved
            // For details cf. the comment before the operatorMapping declaration
            PdfLiteral[] mapping = operatorMapping[operatorString];

            int index = 0;
            if (strokingAlpha < 1)
                index |= 1;
            if (nonStrokingAlpha < 1)
                index |= 2;

            oper = mapping[index];
            operands[operands.Count - 1] = oper;
        }

        base.Write(processor, oper, operands);
    }

    // The current transparency values; beware: save and restore state operations are ignored!
    float strokingAlpha = 1;
    float nonStrokingAlpha = 1;

    void updateTransparencyFrom(PdfName gsName)
    {
        PdfDictionary extGState = getGraphicsStateDictionary(gsName);
        if (extGState != null)
        {
            PdfNumber number = extGState.GetAsNumber(PdfName.ca);
            if (number != null)
                nonStrokingAlpha = number.FloatValue;
            number = extGState.GetAsNumber(PdfName.CA);
            if (number != null)
                strokingAlpha = number.FloatValue;
        }
    }

    PdfDictionary getGraphicsStateDictionary(PdfName gsName)
    {
        PdfDictionary extGStates = resources.GetAsDict(PdfName.EXTGSTATE);
        return extGStates.GetAsDict(gsName);
    }

    //
    // Map from an operator name to an array of operations it becomes depending
    // on the current graphics state:
    //
    // * [0] the operation in case of no transparency
    // * [1] the operation in case of stroking transparency
    // * [2] the operation in case of non-stroking transparency
    // * [3] the operation in case of stroking and non-stroking transparency
    //
    Dictionary<String, PdfLiteral[]> operatorMapping = new Dictionary<String, PdfLiteral[]>();

    public TransparentGraphicsRemover()
    {
        PdfLiteral _S = new PdfLiteral("S");
        PdfLiteral _s = new PdfLiteral("s");
        PdfLiteral _f = new PdfLiteral("f");
        PdfLiteral _fStar = new PdfLiteral("f*");
        PdfLiteral _B = new PdfLiteral("B");
        PdfLiteral _BStar = new PdfLiteral("B*");
        PdfLiteral _b = new PdfLiteral("b");
        PdfLiteral _bStar = new PdfLiteral("b*");
        PdfLiteral _n = new PdfLiteral("n");

        operatorMapping["S"] = new PdfLiteral[]{ _S, _n, _S, _n };
        operatorMapping["s"] = new PdfLiteral[]{ _s, _n, _s, _n };
        operatorMapping["f"] = new PdfLiteral[]{ _f, _f, _n, _n };
        operatorMapping["F"] = new PdfLiteral[]{ _f, _f, _n, _n };
        operatorMapping["f*"] = new PdfLiteral[]{ _fStar, _fStar, _n, _n };
        operatorMapping["B"] = new PdfLiteral[]{ _B, _f, _S, _n };
        operatorMapping["B*"] = new PdfLiteral[]{ _BStar, _fStar, _S, _n };
        operatorMapping["b"] = new PdfLiteral[] { _b, _f, _s, _n };
        operatorMapping["b*"] = new PdfLiteral[]{ _bStar, _fStar, _s, _n };
    }
}

Beware: This sample editor is very simple:

  • It only considers transparency created by the ExtGState parameters ca and CA, it in particular ignores masks.
  • It does not look for operations saving or restoring the graphics state.

These limitations can easily be lifted but require more code than appropriate for a stackoverflow answer.

Applying this editor to the OP's sample file like this

string source = @"test3.pdf";
string dest = @"test3-noTransparency.pdf";

using (PdfReader pdfReader = new PdfReader(source))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write)))
{
    PdfContentStreamEditor editor = new TransparentGraphicsRemover();

    for (int i = 1; i <= pdfReader.NumberOfPages; i++)
    {
        editor.EditPage(pdfStamper, i);
    }
}

results in a PDF file without the watermark.

I don't have the tools the OP exported the contents to word with, NitroPDF and Foxit, so I could not execute a final test. Adobe Acrobat (version 9.5) at least upon export to Word does not include the watermark .

If the OP's tools still have traces of the watermark in the exported Word files, one can easily improve this class to actually drop path creation and drawing operations while transparency is active.

The same in Java

I started implementing this for iText in Java and only later realized the OP had iTextSharp in .Net on his mind. Here are the equivalent Java classes:

public class PdfContentStreamEditor extends PdfContentStreamProcessor
{
    /**
     * This method edits the immediate contents of a page, i.e. its content stream.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void editPage(PdfStamper pdfStamper, int pageNum) throws IOException
    {
        PdfReader pdfReader = pdfStamper.getReader();
        PdfDictionary page = pdfReader.getPageN(pageNum);
        byte[] pageContentInput = ContentByteUtils.getContentBytesForPage(pdfReader, pageNum);
        page.remove(PdfName.CONTENTS);
        editContent(pageContentInput, page.getAsDict(PdfName.RESOURCES), pdfStamper.getUnderContent(pageNum));
    }

    /**
     * This method processes the content bytes and outputs to the given canvas.
     * It explicitly does not descent into form xobjects, patterns, or annotations.
     */
    public void editContent(byte[] contentBytes, PdfDictionary resources, PdfContentByte canvas)
    {
        this.canvas = canvas;
        processContent(contentBytes, resources);
        this.canvas = null;
    }

    /**
     * <p>
     * This method writes content stream operations to the target canvas. The default
     * implementation writes them as they come, so it essentially generates identical
     * copies of the original instructions the {@link ContentOperatorWrapper} instances
     * forward to it.
     * </p>
     * <p>
     * Override this method to achieve some fancy editing effect.
     * </p> 
     */
    protected void write(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException
    {
        int index = 0;

        for (PdfObject object : operands)
        {
            object.toPdf(canvas.getPdfWriter(), canvas.getInternalBuffer());
            canvas.getInternalBuffer().append(operands.size() > ++index ? (byte) ' ' : (byte) '\n');
        }
    }

    //
    // constructor giving the parent a dummy listener to talk to 
    //
    public PdfContentStreamEditor()
    {
        super(new DummyRenderListener());
    }

    //
    // Overrides of PdfContentStreamProcessor methods
    //
    @Override
    public ContentOperator registerContentOperator(String operatorString, ContentOperator operator)
    {
        ContentOperatorWrapper wrapper = new ContentOperatorWrapper();
        wrapper.setOriginalOperator(operator);
        ContentOperator formerOperator = super.registerContentOperator(operatorString, wrapper);
        return formerOperator instanceof ContentOperatorWrapper ? ((ContentOperatorWrapper)formerOperator).getOriginalOperator() : formerOperator;
    }

    @Override
    public void processContent(byte[] contentBytes, PdfDictionary resources)
    {
        this.resources = resources; 
        super.processContent(contentBytes, resources);
        this.resources = null;
    }

    //
    // members holding the output canvas and the resources
    //
    protected PdfContentByte canvas = null;
    protected PdfDictionary resources = null;

    //
    // A content operator class to wrap all content operators to forward the invocation to the editor
    //
    class ContentOperatorWrapper implements ContentOperator
    {
        public ContentOperator getOriginalOperator()
        {
            return originalOperator;
        }

        public void setOriginalOperator(ContentOperator originalOperator)
        {
            this.originalOperator = originalOperator;
        }

        @Override
        public void invoke(PdfContentStreamProcessor processor, PdfLiteral operator, ArrayList<PdfObject> operands) throws Exception
        {
            if (originalOperator != null && !"Do".equals(operator.toString()))
            {
                originalOperator.invoke(processor, operator, operands);
            }
            write(processor, operator, operands);
        }

        private ContentOperator originalOperator = null;
    }

    //
    // A dummy render listener to give to the underlying content stream processor to feed events to
    //
    static class DummyRenderListener implements RenderListener
    {
        @Override
        public void beginTextBlock() { }

        @Override
        public void renderText(TextRenderInfo renderInfo) { }

        @Override
        public void endTextBlock() { }

        @Override
        public void renderImage(ImageRenderInfo renderInfo) { }
    }
}

(PdfContentStreamEditor.java)

public class TransparentGraphicsRemover extends PdfContentStreamEditor
{
    @Override
    protected void write(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException
    {
        String operatorString = operator.toString();
        if ("gs".equals(operatorString))
        {
            updateTransparencyFrom((PdfName) operands.get(0));
        }

        PdfLiteral[] mapping = operatorMapping.get(operatorString);

        if (mapping != null)
        {
            int index = 0;
            if (strokingAlpha < 1)
                index |= 1;
            if (nonStrokingAlpha < 1)
                index |= 2;

            operator = mapping[index];
            operands.set(operands.size() - 1, operator);
        }

        super.write(processor, operator, operands);
    }

    // The current transparency values; beware: save and restore state operations are ignored!
    float strokingAlpha = 1;
    float nonStrokingAlpha = 1;

    void updateTransparencyFrom(PdfName gsName)
    {
        PdfDictionary extGState = getGraphicsStateDictionary(gsName);
        if (extGState != null)
        {
            PdfNumber number = extGState.getAsNumber(PdfName.ca);
            if (number != null)
                nonStrokingAlpha = number.floatValue();
            number = extGState.getAsNumber(PdfName.CA);
            if (number != null)
                strokingAlpha = number.floatValue();
        }
    }

    PdfDictionary getGraphicsStateDictionary(PdfName gsName)
    {
        PdfDictionary extGStates = resources.getAsDict(PdfName.EXTGSTATE);
        return extGStates.getAsDict(gsName);
    }

    //
    // Map from an operator name to an array of operations it becomes depending
    // on the current graphics state:
    //
    // * [0] the operation in case of no transparency
    // * [1] the operation in case of stroking transparency
    // * [2] the operation in case of non-stroking transparency
    // * [3] the operation in case of stroking and non-stroking transparency
    //
    static Map<String, PdfLiteral[]> operatorMapping = new HashMap<String, PdfLiteral[]>();
    static
    {
        PdfLiteral _S = new PdfLiteral("S");
        PdfLiteral _s = new PdfLiteral("s");
        PdfLiteral _f = new PdfLiteral("f");
        PdfLiteral _fStar = new PdfLiteral("f*");
        PdfLiteral _B = new PdfLiteral("B");
        PdfLiteral _BStar = new PdfLiteral("B*");
        PdfLiteral _b = new PdfLiteral("b");
        PdfLiteral _bStar = new PdfLiteral("b*");
        PdfLiteral _n = new PdfLiteral("n");

        operatorMapping.put("S", new PdfLiteral[]{ _S, _n, _S, _n });
        operatorMapping.put("s", new PdfLiteral[]{ _s, _n, _s, _n });
        operatorMapping.put("f", new PdfLiteral[]{ _f, _f, _n, _n });
        operatorMapping.put("F", new PdfLiteral[]{ _f, _f, _n, _n });
        operatorMapping.put("f*", new PdfLiteral[]{ _fStar, _fStar, _n, _n });
        operatorMapping.put("B", new PdfLiteral[]{ _B, _f, _S, _n });
        operatorMapping.put("B*", new PdfLiteral[]{ _BStar, _fStar, _S, _n });
        operatorMapping.put("b", new PdfLiteral[]{ _b, _f, _s, _n });
        operatorMapping.put("b*", new PdfLiteral[]{ _bStar, _fStar, _s, _n });
    }
}

(TransparentGraphicsRemover.java)

@Test
public void testRemoveTransparentGraphicsTest3() throws IOException, DocumentException
{
    try (   InputStream resource = getClass().getResourceAsStream("test3.pdf");
            OutputStream result = new FileOutputStream(new File(RESULT_FOLDER, "test3-noTransparency.pdf")))
    {
        PdfReader pdfReader = new PdfReader(resource);
        PdfStamper pdfStamper = new PdfStamper(pdfReader, result);
        PdfContentStreamEditor editor = new TransparentGraphicsRemover();

        for (int i = 1; i <= pdfReader.getNumberOfPages(); i++)
        {
            editor.editPage(pdfStamper, i);
        }

        pdfStamper.close();
    }
}

(excerpt from EditPageContent.java)

这篇关于从PDF iTextSharp中删除水印的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆