使用iTextSharp的拆分PDF成较小的PDF基于尺寸的 [英] Using itextsharp to split a pdf into smaller pdf's based on size

查看:114
本文介绍了使用iTextSharp的拆分PDF成较小的PDF基于尺寸的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我们有一些非常低效code一个分割成PDF格式的基础上允许的最大尺寸更小的块。阿卡。如果最大尺寸是10megs,8兆文件将被跳过,而一个16兆位的文件将根据的页数被分割

So we have some really inefficient code that splits a pdf into smaller chunks based on a maximum size allowed. Aka. if the max size is 10megs, an 8 meg file would be skipped, while a 16 meg file would be split based on the number of pages.

这是code,我继承并觉得有一定是可以做到这一点更有效的方式,只需要一种方法和对象的实例较少。

This is code that I inherited and feel like there has got to be a more efficient way to do this that requiring only one method and less instantiation of objects.

我们使用以下code调用的方法:

We use the following code to call the methods:

        List<int> splitPoints = null;
        List<byte[]> documents = null;

        splitPoints = this.GetPDFSplitPoints(currentDocument, maxSize);
        documents = this.SplitPDF(currentDocument, maxSize, splitPoints);

方法:

    private List<int> GetPDFSplitPoints(IClaimDocument currentDocument, int maxSize)
    {
        List<int> splitPoints = new List<int>();
        PdfReader reader = null;
        Document document = null;
        int pagesRemaining = currentDocument.Pages;

        while (pagesRemaining > 0)
        {
            reader = new PdfReader(currentDocument.Data);
            document = new Document(reader.GetPageSizeWithRotation(1));

            using (MemoryStream ms = new MemoryStream())
            {
                PdfCopy copy = new PdfCopy(document, ms);
                PdfImportedPage page = null;

                document.Open();

                //Add pages until we run out from the original
                for (int i = 0; i < currentDocument.Pages; i++)
                {
                    int currentPage = currentDocument.Pages - (pagesRemaining - 1);

                    if (pagesRemaining == 0)
                    {
                        //The whole document has bee traversed
                        break;
                    }

                    page = copy.GetImportedPage(reader, currentPage);
                    copy.AddPage(page);

                    //If the current collection of pages exceeds the maximum size, we save off the index and start again
                    if (copy.CurrentDocumentSize > maxSize)
                    {
                        if (i == 0)
                        {
                            //One page is greater than the maximum size
                            throw new Exception("one page is greater than the maximum size and cannot be processed");
                        }

                        //We have gone one page too far, save this split index   
                        splitPoints.Add(currentDocument.Pages - (pagesRemaining - 1));
                        break;
                    }
                    else
                    {
                        pagesRemaining--;
                    }
                }

                page = null;

                document.Close();
                document.Dispose();
                copy.Close();
                copy.Dispose();
                copy = null;
            }
        }

        if (reader != null)
        {
            reader.Close();
            reader = null;
        }

        document = null;

        return splitPoints;
    }

    private List<byte[]> SplitPDF(IClaimDocument currentDocument, int maxSize, List<int> splitPoints)
    {
        var documents = new List<byte[]>();
        PdfReader reader = null;
        Document document = null;
        MemoryStream fs = null;
        int pagesRemaining = currentDocument.Pages;

        while (pagesRemaining > 0)
        {
            reader = new PdfReader(currentDocument.Data);
            document = new Document(reader.GetPageSizeWithRotation(1));

            fs = new MemoryStream();
            PdfCopy copy = new PdfCopy(document, fs);
            PdfImportedPage page = null;

            document.Open();

            //Add pages until we run out from the original
            for (int i = 0; i <= currentDocument.Pages; i++)
            {
                int currentPage = currentDocument.Pages - (pagesRemaining - 1);
                if (pagesRemaining == 0)
                {
                    //We have traversed all pages
                    //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document
                    fs.Flush();
                    copy.Close();
                    documents.Add(fs.ToArray());
                    document.Close();
                    fs.Dispose();
                    break;
                }

                page = copy.GetImportedPage(reader, currentPage);
                copy.AddPage(page);
                pagesRemaining--;

                if (splitPoints.Contains(currentPage + 1) == true)
                {
                    //Need to start a new document
                    //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document
                    fs.Flush();
                    copy.Close();
                    documents.Add(fs.ToArray());
                    document.Close();
                    fs.Dispose();
                    break;
                }
            }

            copy = null;
            page = null;

            fs.Dispose();
        }

        if (reader != null)
        {
            reader.Close();
            reader = null;
        }

        if (document != null)
        {
            document.Close();
            document.Dispose();
            document = null;
        }

        if (fs != null)
        {
            fs.Close();
            fs.Dispose();
            fs = null;
        }

        return documents;
    }

据我所知,只有code网上,我可以看到的是VB,不一定解决大小问题。

As far as I can tell, the only code online that I can see is VB and doesn't necessarily address the size issue.

更新

我们遇到OutOfMemory异常,我相信它与大对象堆的问题。于是一个念头是减少code足迹,将可能减少对大型堆对象的数量。

We're experiencing OutofMemory exceptions and I believe it's an issue with the Large Object Heap. So one thought was to reduce the code footprint and that would possibly reduce the number of large objects on the heap.

基本上,这是一个循环,通过任何数量的PDF的那张的一部分,然后分割它们,并将它们存储在数据库中。现在,我们必须从做所有这些在一次(上次运行是不同大小的PDF 97的)改变方法,运行5 PDF文件到系统的每5分钟。这是不理想的,当我们的工具上升到更多的客户端将无法很好地扩展。

Basically this is part of a loop that goes through any number of PDF's, and then splits them and stores them in the database. Right now, we had to change the method from doing all of them at once (last run was 97 pdf's of various sizes), to running 5 pdf's through the system every 5 minutes. This is not ideal and won't scale well when we ramp up the tool to more clients.

(我们正在处理的50 -100兆PDF格式的,但他们可能会更大)。

(we're dealing with 50 -100 meg pdf's, but they could be larger).

推荐答案

我也继承了这个确切code,而且似乎是它的一大缺陷。在 GetPDFSplitPoints 方法,它的检查,对MAXSIZE复制的页面的总规模确定在哪个页面拆分文件。结果
SplitPDF 方法,当达到分裂时,果然将MemoryStream在这一点上页低于允许的最大尺寸,并且多了一个页面会放过来限制。但经过 document.Close(); 执行,更被添加到的MemoryStream (在一个实例PDF我与中,长度的工作的的MemoryStream 9 MB到19 MB之前和 document.Close )。我的理解是,所有的复制页所需的资源,在增加关闭。结果
我猜我将不得不完全重写这个code,以确保在保留原始页面的完整性,我不超过最大尺寸。

I also inherited this exact code, and there appears to be a major flaw in it. In the GetPDFSplitPoints method, it's checking the total size of the copied pages against maxsize to determine at which page to split the file.
In the SplitPDF method, when it reaches the page where the split occurs, sure enough the MemoryStream at that point is below the maximum size allowed, and one more page would put it over the limit. But after document.Close(); is executed, much more is added to the MemoryStream (in one example PDF I worked with, the Length of the MemoryStream went from 9 MB to 19 MB before and after the document.Close). My understanding is that all the necessary resources for the copied pages are added upon Close.
I'm guessing I'll have to rewrite this code completely to ensure I don't exceed the max size while retaining the integrity of the original pages.

这篇关于使用iTextSharp的拆分PDF成较小的PDF基于尺寸的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆