将使用ITextSharp从html创建的N个pdf文件合并到另一个空白pdf文件中 [英] Merging N pdf files, created from html using ITextSharp, to another blank pdf file

查看:264
本文介绍了将使用ITextSharp从html创建的N个pdf文件合并到另一个空白pdf文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将N个PDF文件合并为一个。我先创建一个空白文件

I need to merge N PDF files into one. I create a blank file first

byte[] pdfBytes = null;

var ms = new MemoryStream();
var doc = new iTextSharp.text.Document();
var cWriter = new PdfCopy(doc, ms);

稍后我循环通过html字符串数组

Later I cycle through html strings array

foreach (NBElement htmlString in someElement.Children())
                    {
                        byte[] msTempDoc = getPdfDocFrom(htmlString.GetString(), cssString.GetString());
                        addPagesToPdf(cWriter, msTempDoc);
                    }

在getPdfDocFrom中我使用XMLWorkerHelper创建pdf文件并将其作为字节数组返回

In getPdfDocFrom I create pdf file using XMLWorkerHelper and return it as byte array

private byte[] getPdfDocFrom(string htmlString, string cssString)
    {
        var tempMs = new MemoryStream();
        byte[] tempMsBytes;
        var tempDoc = new iTextSharp.text.Document();
        var tempWriter = PdfWriter.GetInstance(tempDoc, tempMs);
        tempDoc.Open();

        using (var msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(cssString)))
        {
            using (var msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(htmlString)))
            {
                //Parse the HTML
                iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(tempWriter, tempDoc, msHtml, msCss);
                tempMsBytes = tempMs.ToArray();
            }
        }

        tempDoc.Close();
        return tempMsBytes;
    }

稍后我尝试将此PDF文件中的页面添加到空白页面。

Later on I try to add pages from this PDF file to the blank one.

private static void addPagesToPdf(PdfCopy mainDocWriter,  byte[] sourceDocBytes)
    {

        using (var msOut = new MemoryStream())
        {
            PdfReader reader = new PdfReader(new MemoryStream(sourceDocBytes));
            int n = reader.NumberOfPages;
            PdfImportedPage page;
            for (int i = 1; i <= n; i++)
            {
                page = mainDocWriter.GetImportedPage(reader, i);
                mainDocWriter.AddPage(page);
            }
        }}

尝试从中创建PdfReader时会中断我传递给函数的字节数组。 重建失败:未找到预告片。;原始消息:未找到PDF startxref。

It breaks when it tries to create a PdfReader from the byte array I pass to the function. "Rebuild failed: trailer not found.; Original message: PDF startxref not found."

我之前使用过另一个库来处理PDF。我传递了2个PdfDocuments作为对象,只是在循环中将页面从一个添加到另一个。它不支持Css,所以我不得不切换到ITextSharp。

I used another library to work with PDF before. I passed 2 PdfDocuments as an objects and just added pages from one to another in cycle. It didn't support Css though, so I had to switch to ITextSharp.

我不太了解PdfWriter和PdfCopy之间的区别。

I don't quite get the difference between PdfWriter and PdfCopy.

推荐答案

代码中存在逻辑错误。当您从头开始创建文档时,如在 getPdfDocFrom()方法中所做的那样,在您触发关闭之前,文档不会完成( )方法。在此 Close()方法中,将创建一个预告片以及一个交叉引用(外部参照)表。错误告诉你那些丢失了。

There a logical error in your code. When you create a document from scratch as is done in the getPdfDocFrom() method, the document isn't complete until you've triggered the Close() method. In this Close() method, a trailer is created as well as a cross-reference (xref) table. The error tells you that those are missing.

确实,你确实调用 Close()方法:

Indeed, you do call the Close() method:

tempDoc.Close();

但是当你关闭()该文件,为时已晚:您已经创建了 tempMsBytes 数组。您需要在关闭文档后创建该数组

But by the time you Close() the document, it's too late: you have already created the tempMsBytes array. You need to create that array after you close the document.

编辑:我对此一无所知C#,但是如果 MemoryStream 在关闭它之后清除它的缓冲区,你可以使用 mainDocWriter.CloseStream = false; 以便关闭文档时, MemoryStream 未关闭。

I don't know anything about C#, but if MemoryStream clears its buffer after closing it, you could use mainDocWriter.CloseStream = false; so that the MemoryStream isn't closed when you close the document.

在Java中,设置是个坏主意close stream参数为false。当我阅读问题的答案时,在内存中创建PDF而不是物理文件我发现C#可能并不总是需要这个额外的行。

In Java, it would be a bad idea to set the "close stream" parameter to false. When I read the answers to the question Create PDF in memory instead of physical file I see that C# probably doesn't always require this extra line.

备注:通过添加<$ c $来合并文件c> PdfImportedPage 实例到 PdfWriter 是一个不好品味的例子。如果您使用的是iTextSharp 5或更早版本,则应使用 PdfCopy PdfSmartCopy 来执行此操作。如果您使用 PdfWriter ,则会丢弃大量信息(例如链接注释)。

Remark: merging files by adding PdfImportedPage instances to a PdfWriter is an example of bad taste. If you are using iTextSharp 5 or earlier, you should use PdfCopy or PdfSmartCopy to do that. If you use PdfWriter, you throw away a lot of information (e.g. link annotations).

这篇关于将使用ITextSharp从html创建的N个pdf文件合并到另一个空白pdf文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆