使用iTextSharp的拆分PDF成较小的PDF基于尺寸的 [英] Using itextsharp to split a pdf into smaller pdf's based on size

查看：114 发布时间：2016/6/10 22:15:44 c# asp.net itextsharp

本文介绍了使用iTextSharp的拆分PDF成较小的PDF基于尺寸的的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以，我们有一些非常低效code一个分割成PDF格式的基础上允许的最大尺寸更小的块。阿卡。如果最大尺寸是10megs，8兆文件将被跳过，而一个16兆位的文件将根据的页数被分割

So we have some really inefficient code that splits a pdf into smaller chunks based on a maximum size allowed. Aka. if the max size is 10megs, an 8 meg file would be skipped, while a 16 meg file would be split based on the number of pages.

这是code，我继承并觉得有一定是可以做到这一点更有效的方式，只需要一种方法和对象的实例较少。

This is code that I inherited and feel like there has got to be a more efficient way to do this that requiring only one method and less instantiation of objects.

我们使用以下code调用的方法：

We use the following code to call the methods:

        List<int> splitPoints = null;
        List<byte[]> documents = null;

        splitPoints = this.GetPDFSplitPoints(currentDocument, maxSize);
        documents = this.SplitPDF(currentDocument, maxSize, splitPoints);

方法：

    private List<int> GetPDFSplitPoints(IClaimDocument currentDocument, int maxSize)
    {
        List<int> splitPoints = new List<int>();
        PdfReader reader = null;
        Document document = null;
        int pagesRemaining = currentDocument.Pages;

        while (pagesRemaining > 0)
        {
            reader = new PdfReader(currentDocument.Data);
            document = new Document(reader.GetPageSizeWithRotation(1));

            using (MemoryStream ms = new MemoryStream())
            {
                PdfCopy copy = new PdfCopy(document, ms);
                PdfImportedPage page = null;

                document.Open();

                //Add pages until we run out from the original
                for (int i = 0; i < currentDocument.Pages; i++)
                {
                    int currentPage = currentDocument.Pages - (pagesRemaining - 1);

                    if (pagesRemaining == 0)
                    {
                        //The whole document has bee traversed
                        break;
                    }

                    page = copy.GetImportedPage(reader, currentPage);
                    copy.AddPage(page);

                    //If the current collection of pages exceeds the maximum size, we save off the index and start again
                    if (copy.CurrentDocumentSize > maxSize)
                    {
                        if (i == 0)
                        {
                            //One page is greater than the maximum size
                            throw new Exception("one page is greater than the maximum size and cannot be processed");
                        }

                        //We have gone one page too far, save this split index   
                        splitPoints.Add(currentDocument.Pages - (pagesRemaining - 1));
                        break;
                    }
                    else
                    {
                        pagesRemaining--;
                    }
                }

                page = null;

                document.Close();
                document.Dispose();
                copy.Close();
                copy.Dispose();
                copy = null;
            }
        }

        if (reader != null)
        {
            reader.Close();
            reader = null;
        }

        document = null;

        return splitPoints;
    }

    private List<byte[]> SplitPDF(IClaimDocument currentDocument, int maxSize, List<int> splitPoints)
    {
        var documents = new List<byte[]>();
        PdfReader reader = null;
        Document document = null;
        MemoryStream fs = null;
        int pagesRemaining = currentDocument.Pages;

        while (pagesRemaining > 0)
        {
            reader = new PdfReader(currentDocument.Data);
            document = new Document(reader.GetPageSizeWithRotation(1));

            fs = new MemoryStream();
            PdfCopy copy = new PdfCopy(document, fs);
            PdfImportedPage page = null;

            document.Open();

            //Add pages until we run out from the original
            for (int i = 0; i <= currentDocument.Pages; i++)
            {
                int currentPage = currentDocument.Pages - (pagesRemaining - 1);
                if (pagesRemaining == 0)
                {
                    //We have traversed all pages
                    //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document
                    fs.Flush();
                    copy.Close();
                    documents.Add(fs.ToArray());
                    document.Close();
                    fs.Dispose();
                    break;
                }

                page = copy.GetImportedPage(reader, currentPage);
                copy.AddPage(page);
                pagesRemaining--;

                if (splitPoints.Contains(currentPage + 1) == true)
                {
                    //Need to start a new document
                    //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document
                    fs.Flush();
                    copy.Close();
                    documents.Add(fs.ToArray());
                    document.Close();
                    fs.Dispose();
                    break;
                }
            }

            copy = null;
            page = null;

            fs.Dispose();
        }

        if (reader != null)
        {
            reader.Close();
            reader = null;
        }

        if (document != null)
        {
            document.Close();
            document.Dispose();
            document = null;
        }

        if (fs != null)
        {
            fs.Close();
            fs.Dispose();
            fs = null;
        }

        return documents;
    }

据我所知，只有code网上，我可以看到的是VB，不一定解决大小问题。

As far as I can tell, the only code online that I can see is VB and doesn't necessarily address the size issue.

更新

我们遇到OutOfMemory异常，我相信它与大对象堆的问题。于是一个念头是减少code足迹，将可能减少对大型堆对象的数量。

We're experiencing OutofMemory exceptions and I believe it's an issue with the Large Object Heap. So one thought was to reduce the code footprint and that would possibly reduce the number of large objects on the heap.

基本上，这是一个循环，通过任何数量的PDF的那张的一部分，然后分割它们，并将它们存储在数据库中。现在，我们必须从做所有这些在一次（上次运行是不同大小的PDF 97的）改变方法，运行5 PDF文件到系统的每5分钟。这是不理想的，当我们的工具上升到更多的客户端将无法很好地扩展。

Basically this is part of a loop that goes through any number of PDF's, and then splits them and stores them in the database. Right now, we had to change the method from doing all of them at once (last run was 97 pdf's of various sizes), to running 5 pdf's through the system every 5 minutes. This is not ideal and won't scale well when we ramp up the tool to more clients.

（我们正在处理的50 -100兆PDF格式的，但他们可能会更大）。

(we're dealing with 50 -100 meg pdf's, but they could be larger).

使用iTextSharp的拆分PDF成较小的PDF基于尺寸的 [英] Using itextsharp to split a pdf into smaller pdf's based on size

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

使用iTextSharp的拆分PDF成较小的PDF基于尺寸的 [英] Using itextsharp to split a pdf into smaller pdf&#39;s based on size

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

使用iTextSharp的拆分PDF成较小的PDF基于尺寸的 [英] Using itextsharp to split a pdf into smaller pdf's based on size

登录关闭