读取大量文件,需要更快的方法 [英] Reading a lot of files, need a quicker way

查看:136
本文介绍了读取大量文件,需要更快的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

我目前正在编写一个程序,以搜索与系统上不同事物(例如ECR,RMS,工程图等)相关的文件.该程序的一部分(搜索ECR)要求我阅读大约450个pdf文件,并在其中搜索指定的产品名称,例如"NPS-0243",因为文件名与以下内容无关它引用的产品.我已经成功使用后台程序完成了此任务,该程序在程序开始读取文件时将其启动并将其添加到程序可以查看的字符串数组中.问题是,直到程序运行大约2分钟后,阵列中的文件才会完全填充文件中的文本,这当然会给不想坐下来等待2分钟来搜索目录的用户造成问题.特定类型的文件.

因此,我的问题是,有什么办法可以加快这一过程?我尝试将文本从每个文件写入一个.txt文件,然后在将其添加到数组之前读取并搜索每个文件文本的结尾,这比以前的方法要慢.我认为,即时通讯将不得不等待2分钟,但随后即时通讯又不像你们中的某些人那么聪明!

任何帮助将不胜感激.

这是我当前正在使用的后台工作者代码:

Hello all,

I am currently writing a program to search for files relating to different things (e.g ECR''s, RMS''s, Drawings, etc.) on a system. One part of the program (searching for ECR''s) requires me to read around 450 pdf files and search within them for a specified product name, for example "NPS-0243", as the name of the file is of no relation to the products it references. I have successfully achieved this by using a background worker that starts when the program starts to start reading the files and adding them to a string array which can be looked at by the program. The problem is, the array is not completely filled with the text from the files until about 2 minutes after the program is run, and this of course can cause problems for a user who does not want to sit and wait 2 minutes to search for a certain type of file.

My question therefore is, is there any way of speeding this process up? i have tried writing the text from each file to one .txt file then reading and searching for the end of each file''s text before adding it to the array, this is, if anything, slower than the previous method. In my opinion, im going to have to live with the 2 minute wait, but then again im not as clever as some of you!

Any help would be greatly appreciated.

Here is the background worker code i am currently using:

public void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
    int i = 0;
    foreach (string thisfile in filePaths)
    {
        if (thisfile.Contains(".MASTER ECR LOG") || thisfile.Contains("Thumbs.db"))
        { }
        else
        {
            i++;
            PdfReader reader2 = new PdfReader(thisfile);
            string strText = string.Empty;
            for (int page = 1; page <= reader2.NumberOfPages; page++)
            {
                ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
                PdfReader reader = new PdfReader(thisfile);
                string s = PdfTextExtractor.GetTextFromPage(reader, page, its);
                s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
                strText = strText + s;
                data[i, 1] = strText;
                data[i, 2] = thisfile;
                reader.Close();
            }
        }
    }
}

推荐答案

您是否考虑过将它们编入索引作为后台任务?
这样,您只需要搜索索引即可.
Have you considered indexing them as a background task?
That way, you just have to search the index.


这篇关于读取大量文件,需要更快的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆