PDF添加文本和拼合 [英] PDF Add Text and Flatten

查看:469
本文介绍了PDF添加文本和拼合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开发一个显示的PDF文件,并​​允许用户订购文件的副本的Web应用程序。我们要显示PDF文件时添加文本,如无薪或样品,对飞。我一直在使用iTextSharp的做到了这一点。然而,页面图像容易从水印文本分离并使用各种免费软件程序提取。

I'm developing a web application that displays PDFs and allows users to order copies of the documents. We want to add text, such as "unpaid" or "sample", on the fly when the PDF is displayed. I have accomplished this using itextsharp. However, the page images are easily separated from the watermark text and extracted using a variety of freeware programs.

如何能在水印添加到PDF中的网页,但平坦化页面图像并使得水印变为PDF页面图像的一部分,从而被除去$ P $从pventing水印水印一起(除非该人想用photoshop)?

How can I add the watermark to the pages in the PDF, but flatten the page images and watermark together so that the watermark becomes part of the pdf page image, thereby preventing the watermark from being removed (unless the person wants to use photoshop)?

推荐答案

如果我是你,我会往下走一条不同的道路。使用iTextSharp的(或其他库)中提取一个给定的文档的每一页到一个文件夹。然后使用一些程序(Ghostscript的,Photoshop中,也许GIMP),您可以批量和每个页面转换为图像。然后,写你的文字叠加到图像上。最后使用iTextSharp的所有每个文件夹中的所有影像合并回一个PDF。

If I were you I would go down a different path. Using iTextSharp (or another library) extract each page of a given document to a folder. Then use some program (Ghostscript, Photoshop, maybe GIMP) that you can batch and convert each page to an image. Then write your overlay text onto the images. Finally use iTextSharp to combine all of the images in each folder back into a PDF.

我知道这听起​​来像一个痛苦,但你应该只需要每个文档我认为这样做一次。

I know this sounds like a pain but you should only have to do this once per document I assume.

如果您不希望沿着这条路走下去,让我让你去,你需要做的,提取图像内容。下面大部分的code来源于这个帖子。在code的结束,我保存图像到桌面。既然你已经有了原始字节,所以你也可以很容易地抽那些成为System.Drawing.Image 对象,并将其写回到一个新的 PdfWriter 对象,它是听起来像你所熟悉的。下面是一个完整的工作WinForms应用程序打靶iTextSharp的5.1.1.0

If you don't want to go down this route, let me get you going on what you need to do to extract images. Much of the code below comes from this post. At the end of the code I'm saving the images to the desktop. Since you've got raw bytes so you could also easily pump those into a System.Drawing.Image object and write them back into a new PdfWriter object which is sounds like you are familiar with. Below is a full working WinForms app targetting iTextSharp 5.1.1.0

Option Explicit On
Option Strict On

Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports System.IO
Imports System.Runtime.InteropServices

Public Class Form1

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        ''//File to process
        Dim InputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "SampleImage.pdf")

        ''//Bind a reader to our PDF
        Dim R As New PdfReader(InputFile)

        ''//Setup some variable to use below
        Dim bytes() As Byte
        Dim obj As PdfObject
        Dim pd As PdfDictionary
        Dim filter, width, height, bpp As String
        Dim pixelFormat As System.Drawing.Imaging.PixelFormat
        Dim bmp As System.Drawing.Bitmap
        Dim bmd As System.Drawing.Imaging.BitmapData

        ''//Loop through all of the references in the file
        Dim xo = R.XrefSize
        For I = 0 To xo - 1
            ''//Get the object
            obj = R.GetPdfObject(I)
            ''//Make sure we have something and that it is a stream
            If (obj IsNot Nothing) AndAlso obj.IsStream() Then
                ''//Case it to a dictionary object
                pd = DirectCast(obj, PdfDictionary)
                ''//See if it has a subtype property that is set to /IMAGE
                If pd.Contains(PdfName.SUBTYPE) AndAlso pd.Get(PdfName.SUBTYPE).ToString() = PdfName.IMAGE.ToString() Then
                    ''//Grab various properties of the image
                    filter = pd.Get(PdfName.FILTER).ToString()
                    width = pd.Get(PdfName.WIDTH).ToString()
                    height = pd.Get(PdfName.HEIGHT).ToString()
                    bpp = pd.Get(PdfName.BITSPERCOMPONENT).ToString()

                    ''//Grab the raw bytes of the image
                    bytes = PdfReader.GetStreamBytesRaw(DirectCast(obj, PRStream))

                    ''//Images can be encoded in various ways. /DCTDECODE is the simplest because its essentially JPEG and can be treated as such.
                    ''//If your PDFs contain the other types you will need to figure out how to handle those on your own
                    Select Case filter
                        Case PdfName.ASCII85DECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case PdfName.ASCIIHEXDECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case PdfName.FLATEDECODE.ToString()
                            ''//This code from http://stackoverflow.com/questions/802269/itextsharp-extract-images/1220959#1220959
                            bytes = pdf.PdfReader.FlateDecode(bytes, True)
                            Select Case Integer.Parse(bpp)
                                Case 1
                                    pixelFormat = Drawing.Imaging.PixelFormat.Format1bppIndexed
                                Case 24
                                    pixelFormat = Drawing.Imaging.PixelFormat.Format24bppRgb
                                Case Else
                                    Throw New Exception("Unknown pixel format " + bpp)
                            End Select
                            bmp = New System.Drawing.Bitmap(Int32.Parse(width), Int32.Parse(height), pixelFormat)
                            bmd = bmp.LockBits(New System.Drawing.Rectangle(0, 0, Int32.Parse(width), Int32.Parse(height)), System.Drawing.Imaging.ImageLockMode.WriteOnly, pixelFormat)
                            Marshal.Copy(bytes, 0, bmd.Scan0, bytes.Length)
                            bmp.UnlockBits(bmd)
                            Using ms As New MemoryStream
                                bmp.Save(ms, System.Drawing.Imaging.ImageFormat.Jpeg)
                                bytes = ms.GetBuffer()
                            End Using
                        Case PdfName.LZWDECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case PdfName.RUNLENGTHDECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case PdfName.DCTDECODE.ToString()
                            ''//Bytes should be raw JPEG so they should not need to be decoded, hopefully
                        Case PdfName.CCITTFAXDECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case PdfName.JBIG2DECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case PdfName.JPXDECODE.ToString()
                            Throw New NotImplementedException("Decoding this filter has not been implemented")
                        Case Else
                            Throw New ApplicationException("Unknown filter found : " & filter)
                    End Select

                    ''//At this points the byte array should contain a valid JPEG byte data, write to disk
                    My.Computer.FileSystem.WriteAllBytes(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), I & ".jpg"), bytes, False)
                End If
            End If

        Next

        Me.Close()
    End Sub
End Class

这篇关于PDF添加文本和拼合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆