4000的文件,我想在做一个字符串搜索 [英] 4000 files that I want to do a string search on

查看:150
本文介绍了4000的文件,我想在做一个字符串搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是搜索在多个文件中的字符串的最佳方式?

What is the best way to search for strings in multiple files?

目前我做的通过每个文件foreach循环,但已经注意到它占用4- 5分钟要经过的所有文件4000+

Currently I am doing a foreach loop through each file but have noticed it takes up to 4-5min to go through all 4000+ files

是否有某种平行的方式来做到这一点?

Is there some sort of parallel way to do this?

推荐答案

要做到这一点,最好的办法是生产者消费者模式。你这个做的是你必须从硬盘读取一个线程,并在一个队列加载数据,那么你有其它线程的不确定数处理数据。

The best way to do this is the Producer Consumer model. What you do with this is you have one thread read from the hard drive and load the data in to a queue, then you have a indeterminate number of other threads process the data.

所以说你的旧代码是Directory.GetFiles这个

So say your old code was this

foreach(var file in Directory.GetFiles(someSearch)
{
     string textToRead = File.ReadAllText(file);
     ProcessText(textToRead)
}

新的代码是

var collection = new BlockingCollection<string>(); //You may want to set a max size so you don't use up all your memory

Task producer = Task.Run(() =>
{
    foreach(var file in Directory.GetFiles(someSearch)
    {
         collection.Add(File.ReadAllText(file))
    }
    collection.CompleteAdding();
});
Parallel.ForEach(collection.GetConsumingEnumerable(), ProcessText); //Make sure any actions ProcessText does (like incrementing any variables in the class) is done in a thread safe manner.



这样做是它可以让一个线程从硬盘驱动器读取不打任何其他线程我输入/输出,但是它允许多个线程处理被同时读取的所有数据。

What this does is it lets one thread read from the hard drive and not fight any other threads for I/O, but it lets multiple threads process the data that was read in all at the same time.

这篇关于4000的文件,我想在做一个字符串搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆