使用线程池/线程来读取大型txt文件? [英] Using threadpools/threading for reading large txt files?

查看:137
本文介绍了使用线程池/线程来读取大型txt文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于我之前的一个问题,我发布了:

On a previous question of mine I posted:

我必须阅读几个非常大的txt文件,并且必须使用多个线程或单个线程这取决于用户输入。
假设我有一个获取用户输入的main方法,并且用户请求单个线程并希望为该线程处理20个txt文件。我怎么做到这一点?请注意,下面不是我的代码或它的设置,而是想法。

示例:

int numFiles = 20;
int threads = 1;

 String[] list = new String[20];
 for(int i = 1; i < 21; i++){
   list[i] = "hello" + i + ".txt";//so the list is a hello1.txt, hello2.txt, ...,  hello20.txt
 }

 public void run(){
 //processes txt file
 }

总而言之,我如何通过单个线程实现这一目标?有20个线程?

并且用户建议使用threadPools:

And a user suggested using threadPools:

当user指定要使用的线程数,您可以适当地配置池,提交一组文件读取作业,并让池对执行进行排序。
在Java世界中,您将使用Executors.newFixedThreadPool工厂方法,并将每个作业提交为Callable。这是IBM关于Java线程池的一篇文章。

所以现在我有一个名为sortAndMap(String x)的方法,它接受一个txt文件名并且处理,并且对于上面的示例,将具有

So now I have I have a method called sortAndMap(String x) which takes in a txt file name and does the processing, and for the example above, would have

Executors.newFixedThreadPool(numThreads);

Executors.newFixedThreadPool(numThreads);

我如何在threadPools中使用它,以便我上面的例子可行?

How do I use this with threadPools so that my example above is doable?

推荐答案

好的,请耐心等待我,因为我需要解释一些事情。

Ok, bear with me on this, because I need to explain a few things.

首先,除非你有多个磁盘或者是一个SSD的单个磁盘,否则不建议使用多个线程来读取从磁盘。关于这个主题的许多问题已经发布,结论是相同的:使用多个线程从单个机械磁盘读取将损害性能而不是改善它。

First off, unless you have multiple disks or perhaps a single disk which is SSD, it's not recommended to use more than one thread to read from the disk. Many questions on this topic have been posted and the conclusion was the same: using multiple threads to read from a single mechanical disk will hurt performance instead of improving it.

以上因为磁盘的机械头需要继续寻找下一个读取位置。使用多个线程意味着当每个线程都有机会运行时,它会将磁头指向磁盘的不同部分,从而使它在磁盘区域之间无效率地反弹。

The above happens because the disk's mechanical head needs to keep seeking the next position to read. Using multiple threads means that when each thread gets a chance to run it will direct the head to a different section of the disk, thus making it bounce between disk areas inefficiently.

处理多个文件的公认解决方案是拥有一个生产者(读者线程) - 多个消费者(处理线程)系统。在这种情况下,理想的机制是一个线程池,一个线程充当生产者,并将任务放在池队列中供工作人员处理。

The accepted solution for processing multiple files is to have a single producer (a reader thread) - multiple consumer (processing threads) system. The ideal mechanism is a thread pool in this case, with a thread acting as the producer and putting tasks in the pool queue for the workers to process.

这样的事情:

int numFiles = 20;
int threads = 4;

ExecutorService exec = Executors.newFixedThreadPool(threads);

for(int i = 0; i < numFiles; i++){
    String[] fileContents = // read current file;
    exec.submit(new ThreadTask(fileContents));
}

exec.shutdown();
exec.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);
...

class ThreadTask implements Runnable {

   private String[] fileContents;

   public ThreadTask(String[] fileContents) {
        this.fileContents = fileContents;
   }

   public void run(){
      //processes txt file
   }
}

这篇关于使用线程池/线程来读取大型txt文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆