使用itextsharp删除基于文本的水印 [英] Removing Text based watermarks using itextsharp

查看:276
本文介绍了使用itextsharp删除基于文本的水印的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据此帖子(从PDF iTextSharp中删除水印),@mkl代码适用于ExGstate图形水印,但我已经测试了此代码,以从某些文件中删除水印,这些文件在PDF内容后带有基于文本的水印(例如,该文件:

According to this post (Removing Watermark from PDF iTextSharp) , @mkl code works fine for ExGstate graphical watermarks but I have tested this code to remove watermark from some files which have Text based watermarks behind PDF contents (like this file : http://s000.tinyupload.com/index.php?file_id=05961025831018336372) I have tried multiple solutions that found in this site but get no success. Can anyone help to remove this watermark types by changing above @mkl solution?

谢谢

推荐答案

就像遇到问题一样,OP引用(删除PDF iTextSharp 中的水印,您可以在我对该问题的答案中呈现的PdfContentStreamEditor类基础上,从示例文件中删除水印.

Just like in the case of the question the OP references (Removing Watermark from PDF iTextSharp), you can remove the watermark from your sample file by building upon the PdfContentStreamEditor class presented in my answer to that question.

但是,与其他答案中的解决方案相比,我们不想隐藏基于某些透明度值的矢量图形,而是从此隐藏"SID存档":

In contrast to the solution in that other answer, though, we do not want to hide vector graphics based on some transparency value but instead the writing "Archive of SID" from this:

首先,我们必须选择一个标准来识别背景文本.让我们使用这样的事实:到目前为止,这里的著作是最大的.使用此标准使此iText/Java 解决方案的任务实际上就是iTextSharp/C#吊坠.

First we have to select a criterion to recognize the background text by. Let's use the fact that the writing is by far the largest here. Using this criterion makes the task at hand essentially the iTextSharp/C# pendant to this iText/Java solution.

但是有一个问题:如该答案中所述:

There is a problem, though: As mentioned in that answer:

第二个示例中使用的gs().getFontSize()可能与您期望的不一样,因为有时坐标系已被当前的变换矩阵和文本矩阵拉伸.可以扩展该代码以考虑这些影响.

The gs().getFontSize() used in the second sample may not be what you expect it to be as sometimes the coordinate system has been stretched by the current transformation matrix and the text matrix. The code can be extended to consider these effects.

这恰好在这里发生:使用1号字体,然后通过文本矩阵拉伸小文本:

Exactly this is happening here: A font size of 1 is used and that small text then is stretched by means of the text matrix:

/NxF0 1 Tf
49.516754 49.477234 -49.477234 49.516754 176.690933 217.316086 Tm

因此,我们需要考虑文本矩阵.不幸的是,文本矩阵是私有成员.因此,我们还需要一些反射魔术.

Thus, we need to take the text matrix into account. Unfortunately the text matrix is a private member. Thus, we will also need some reflection magic.

因此,该文件的可能的背景去除剂如下所示:

Thus, a possible background remover for that file looks like this:

class BigTextRemover : PdfContentStreamEditor
{
    protected override void Write(PdfContentStreamProcessor processor, PdfLiteral operatorLit, List<PdfObject> operands)
    {
        if (TEXT_SHOWING_OPERATORS.Contains(operatorLit.ToString()))
        {
            Vector fontSizeVector = new Vector(0, Gs().FontSize, 0);
            Matrix textMatrix = (Matrix) textMatrixField.GetValue(this);
            Matrix curentTransformationMatrix = Gs().GetCtm();
            Vector transformedVector = fontSizeVector.Cross(textMatrix).Cross(curentTransformationMatrix);
            float transformedFontSize = transformedVector.Length;
            if (transformedFontSize > 40)
                return;
        }
        base.Write(processor, operatorLit, operands);
    }
    System.Reflection.FieldInfo textMatrixField = typeof(PdfContentStreamProcessor).GetField("textMatrix", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance);
    List<string> TEXT_SHOWING_OPERATORS = new List<string>{"Tj", "'", "\"", "TJ"};
}

选择40时要牢记该文本矩阵.

The 40 has been chosen with that text matrix in mind.

像这样应用

[Test]
public void testRemoveBigText()
{
    string source = @"sid-1.pdf";
    string dest = @"sid-1-noBigText.pdf";

    using (PdfReader pdfReader = new PdfReader(source))
    using (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write)))
    {
        PdfContentStreamEditor editor = new BigTextRemover();

        for (int i = 1; i <= pdfReader.NumberOfPages; i++)
        {
            editor.EditPage(pdfStamper, i);
        }
    }
}

到您的示例文件会导致:

to your sample file results in:

这篇关于使用itextsharp删除基于文本的水印的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆