ITextSharp花费太多时间获取页数 [英] ITextSharp taking too much time in getting Number of Pages

查看:204
本文介绍了ITextSharp花费太多时间获取页数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这段代码:

  foreach(string pdfFile in Directory.EnumerateFiles(selectedFolderMulti_txt.Text,*。 pdf,SearchOption.AllDirectories))
{
// filePath = pdfFile.FullName;
// string abc = Path.GetFileName(pdfFile);
try
{
// pdfReader = new iTextSharp.text.pdf.PdfReader(filePath);
pdfReader = new iTextSharp.text.pdf.PdfReader(pdfFile);
rownum = pdfListMulti_gridview.Rows.Add();
pdfListMulti_gridview.Rows [rownum] .Cells [0] .Value = counter ++;
//pdfListMulti_gridview.Rows[rownum].Cells[1].Value = pdfFile.Name;
pdfListMulti_gridview.Rows [rownum] .Cells [1] .Value = System.IO.Path.GetFileName(pdfFile);
pdfListMulti_gridview.Rows [rownum] .Cells [2] .Value = pdfReader.NumberOfPages;
//pdfListMulti_gridview.Rows[rownum].Cells[3].Value = filePath;
pdfListMulti_gridview.Rows [rownum] .Cell [3] .Value = pdfFile;
// totalpages + = pdfReader.NumberOfPages;
}
catch
{
//MessageBox.Show(\"打开'+ pdfFile.Name +',Error!,MessageBoxButtons时出现错误。 OK,MessageBoxIcon.Error);
MessageBox.Show(打开时出现错误+ System.IO.Path.GetFileName(pdfFile)+',Error!,MessageBoxButtons.OK,MessageBoxIcon.Error);问题是,当我今天指定了一个文件夹时,有一个文件夹,其中包含一个文件夹和一个文件夹。 4000 pdf文件,花了大约20分钟读取所有文件,并显示我的结果。然后,当我输入一个包含超过20,000个文件的文件夹时,我想这个代码会做什么。



如果我注释掉这一行:

  pdfListMulti_gridview.Rows [rownum] .Cells [2] .Value = pdfReader.NumberOfPages; 

然后,似乎所有的处理负担都从代码中删除。



所以,我想从你们那里是一个建议,使我的方法高效,更少的时间,应该采取处理所有文件。或者有任何替代方案?

解决方案

绝对做什么@ChrisBint说,将超过Window的缓慢文件夹与许多文件。



但要获得更快的速度,请务必使用 PdfReader 的重载, > RandomAccessFileOrArray 对象。在我的所有测试中,此对象比常规流更快方式。构造函数有一些重载,但你应该主要关心 RandomAccessFileOrArray(string filename,bool forceRead)。第二个参数是是否将整个文件加载到内存(如果我正确理解文档)。对于非常大的文件,这可能是一个性能命中,但在现代机器上它不应该重要,所以我建议你传递 true 到这。如果你通过 false ,需要多次敲击磁盘,因为解析cursor遍历文件。



所以,你可以在一个非常紧的循环中做到这一点。对我来说,包含总共超过42,000页的4,000个文件大约需要2秒。

  var files = Directory.EnumerateFiles workingFolder,* .pdf); 
int totalPageCount = 0;
foreach(文件中的字符串f)
{
totalPageCount + = new PdfReader(new RandomAccessFileOrArray(f,true),null).NumberOfPages;
}
MessageBox.Show(String.Format(Total Page Count:{0:N0},totalPageCount));


I have this piece of code:

foreach(string pdfFile in Directory.EnumerateFiles(selectedFolderMulti_txt.Text,"*.pdf",SearchOption.AllDirectories))
{
    //filePath = pdfFile.FullName;
    //string abc = Path.GetFileName(pdfFile);
    try
    {
        //pdfReader = new iTextSharp.text.pdf.PdfReader(filePath);
        pdfReader = new iTextSharp.text.pdf.PdfReader(pdfFile);
        rownum = pdfListMulti_gridview.Rows.Add();
        pdfListMulti_gridview.Rows[rownum].Cells[0].Value = counter++;
        //pdfListMulti_gridview.Rows[rownum].Cells[1].Value = pdfFile.Name;
        pdfListMulti_gridview.Rows[rownum].Cells[1].Value = System.IO.Path.GetFileName(pdfFile);
        pdfListMulti_gridview.Rows[rownum].Cells[2].Value = pdfReader.NumberOfPages;
        //pdfListMulti_gridview.Rows[rownum].Cells[3].Value = filePath;
        pdfListMulti_gridview.Rows[rownum].Cells[3].Value = pdfFile;
        //totalpages += pdfReader.NumberOfPages;
    }
    catch
    {
        //MessageBox.Show("There was an error while opening '" + pdfFile.Name + "'", "Error!", MessageBoxButtons.OK, MessageBoxIcon.Error);
        MessageBox.Show("There was an error while opening '" + System.IO.Path.GetFileName(pdfFile) + "'", "Error!", MessageBoxButtons.OK, MessageBoxIcon.Error);
    }
}

Problem is that when today I specified a folder having about 4000 pdf files, It took about 20 minutes to read all files and show me the results. Then, I thought what will this code do when I will input a folder having more than 20,000 files.

If I comment out this line:

pdfListMulti_gridview.Rows[rownum].Cells[2].Value = pdfReader.NumberOfPages;

Then, it seems if all of the processing burden is removed from the code.

So, what I want from you guys is a suggestion for making my approach efficient and less time should be taken to process all files. Or there is any alternative?

解决方案

Definitely do what @ChrisBint said, that will get past Window's slowness with folders with many files.

But to get even more speed make sure to use the overload of PdfReader that takes a RandomAccessFileOrArray object instead. This object is way faster than regular streams in all of my testings. The constructor has a couple of overloads but you should mainly concern yourself with RandomAccessFileOrArray(string filename, bool forceRead). The second parameter is whether or not to load the entire file into memory (if I'm understanding the documentation correctly). For very large files this might be a performance hit but on modern machines it shouldn't matter much so I recommend that you pass true to this. If you pass false the disk will need to be hit several times as the parsing "cursor" walks through the file.

So with all of that you can do this in a very tight loop. For me, 4,000 files containing a total of over 42,000 pages takes about 2 seconds to run.

        var files = Directory.EnumerateFiles(workingFolder, "*.pdf");
        int totalPageCount = 0;
        foreach (string f in files)
        {
            totalPageCount += new PdfReader(new RandomAccessFileOrArray(f, true), null).NumberOfPages;
        }
        MessageBox.Show(String.Format("Total Page Count : {0:N0}", totalPageCount));

这篇关于ITextSharp花费太多时间获取页数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆