PdfContentStreamEditor在PDF文件上旋转图像 [英] PdfContentStreamEditor rotating image on PDF file

查看:633
本文介绍了PdfContentStreamEditor在PDF文件上旋转图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望这是一个简单的问题。
我正在尝试使用iTextSharp修改一些PDF文件,但似乎iTextSharp放在文件末尾的XMP元数据破坏了PDF文件的布局(我不是很熟悉PDF格式可以理解为什么)。





您可以从上面的两张图片中看到文档似乎已旋转。然而,从PDF文件看二进制差异来看,唯一不同的是文件末尾的一些XMP元数据





我尝试在多个PDF查看器中打开文件(Sumatra PDF,Edge Browser和Adobe Acrobat)并且全部显示同样的怪异。



我想我有两个问题:
a)如何在文件末尾只有XMP meteadata来改变PDF文件?
b)如何让iTextSharp不产生这个输出? (iTextSharp似乎只在我添加/编辑内容时执行此操作,而不是如果我只删除Javascript或类似内容)



< EDIT 1>

我用于iTextSharp的代码是来自帖子的PdfContentStreamEditor(逐字):



使用固定代码,它看起来像这样:




I have what I hope is an easy question. I'm trying to use iTextSharp to modify some PDF files, however it seems that the XMP metadata that iTextSharp puts at the end of the files is ruining the layout of the PDF files (and I'm not very conversant in the PDF format to understand at all why).

You can see from the two images above that the document appears to have been rotated. From looking at the PDF files as binary differences however, the only thing different appears to be some XMP metadata at the end of the files

I've tried opening the files in several PDF viewers (Sumatra PDF, Edge Browser and Adobe Acrobat) and all show the same weirdness.

I guess I have two questions: a) How can the PDF file be so altered from just having XMP meteadata at the end of the file? b) How can I make iTextSharp not produce this output? (iTextSharp only seems to do this when I Add/Edit content, and not if I just strip out Javascript or similar)

<EDIT 1>
The code that I'm using for the iTextSharp is the PdfContentStreamEditor (verbatim) from the post here: https://stackoverflow.com/a/35915789/2535822
</EDIT 1>
<EDIT 2>
Ok.. it seems that it's not the XMP Metadata. I got rid of that by using:

pdfStamper.XmpMetadata = new byte[0];

However there is still a bunch of extra data placed at the end of the file

2 0 obj
<</Producer(PDFCreator 2.5.2.5233; modified using iTextSharp’ 5.5.13 ©2000-2018 iText Group NV \(AGPL-version\))/CreationDate(D:20171206173510+10'30')/ModDate(D:20180325144710+11'00')/Title(þÿ
endobj
404 0 obj
<</Length 0/Type/Metadata/Subtype/XML>>stream

endstream
endobj
405 0 obj
<</Length 3638/Filter/FlateDecode>>stream
xœÍZmÅ/6ÒZ2ÁÆ€
....

</EDIT 2>

解决方案

You have indeed found a bug in the PdfContentStreamEditor I used in this answer while the other issue requires one to know how to disable a special feature or quirk (depending on the circumstances) of iText.

Rotation of the content

This part deals with the rotation of content in the sample document PHA-Pro 8 - File.pdf provided by the OP.

As you already have seen yourself, the rotation issue appears connected with the fact that the page rotation of the page in question is not 0.

Indeed, the iText PdfStamper has a feature which in case of rotated pages automatically rotates additions one applies to the OverContent or UnderContent. This feature can be quite handy if you want to add upright content to the page without having to apply rotation yourself to make it upright. In case of the PdfContentStreamEditor, though, all coordinates we receive from the existing content already have the applicable rotation factored in.

Thus, we need to disable this feature. One can do so using the PdfStamper property RotateContents:

using (PdfReader pdfReader = new PdfReader(source))
using (PdfStamper pdfStamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create, FileAccess.Write), (char)0, true))
{
    pdfStamper.RotateContents = false;
    PdfContentStreamEditor editor = new PdfContentStreamEditor();

    for (int i = 1; i <= pdfReader.NumberOfPages; i++)
    {
        editor.EditPage(pdfStamper, i);
    }
}

Scrambling of text

This part deals with the scrambling of text in the sample document AS62061-2006.pdf provided by the OP.

You have found a bug in the PdfContentStreamEditor. Its Write method contains this loop:

foreach (PdfObject pdfObject in operands)
{
    pdfObject.ToPdf(canvas.PdfWriter, canvas.InternalBuffer);
    canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
}

It should instead be

foreach (PdfObject pdfObject in operands)
{
    pdfObject.ToPdf(null, canvas.InternalBuffer);
    canvas.InternalBuffer.Append(operands.Count > ++index ? (byte) ' ' : (byte) '\n');
}

If one presents the PdfWriter to the ToPdf method of a PdfString and the PdfWriter uses encryption, the string contents are getting encrypted. But here the string is written to a stream, and in that case not the individual string must be encrypted but instead eventually the whole stream.

This applies to the PDF provided by the OP because

  • the PDF is encrypted using the default password and
  • the OP edited using a PdfStamper in append mode which encrypts the additions using the same password as the original file.

With the original code, the result looks like this:

With the fixed code, it looks like this:

这篇关于PdfContentStreamEditor在PDF文件上旋转图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆