并发读取文件(java preffered) [英] Concurrent reading of a File (java preffered)

查看:561
本文介绍了并发读取文件(java preffered)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大文件,需要多个小时处理。所以我想要尝试估计块和并行读取块。
可以并发读取单个文件?我看了RandomAccessFile以及nio.FileChannel,但基于其他职位不知道这种方法是否会工作。
建议!!

I have a large File that takes multiple hours to process. So I am thinking of trying to estimate chunks and read the chunks in parallel. is it possible to to concurrent read on a single File? I have looked at both RandomAccessFile as well as nio.FileChannel but based on other posts am not sure if this approach would work. suggestion!!

推荐答案


最重要的问题是什么是瓶颈

The most important question here is what is the bottleneck in your case.

如果瓶颈是您的磁盘IO ,那么您可以在软件部分进行操作。并行计算只会使事情变得更糟,因为从不同部分同时读取文件会降低磁盘性能。

If the bottleneck is your disk IO, then there isn't much you can do at the software part. Parallelizing the computation will only make things worse, because reading the file from different parts simultaneously will degrade disk performance.

如果瓶颈是处理能力 ,并且您有多个CPU内核,那么您可以利用启动多个线程来处理文件的不同部分。您可以安全地创建多个 InputStream s或 Reader 以并行方式读取文件的不同部分不要超过操作系统对打开文件数量的限制)。您可以将工作分成任务并并行运行,如下例所示:

If the bottleneck is processing power, and you have multiple CPU cores, then you can take an advantage of starting multiple threads to work on different parts of the file. You can safely create several InputStreams or Readers to read different parts of the file in parallel (as long as you don't go over your operating system's limit for the number of open files). You could separate the work into tasks and run them in parallel, like in this example:

import java.io.*;
import java.util.*;
import java.util.concurrent.*;

public class Split {
    private File file;

    public Split(File file) {
        this.file = file;
    }

    // Processes the given portion of the file.
    // Called simultaneously from several threads.
    // Use your custom return type as needed, I used String just to give an example.
    public String processPart(long start, long end)
        throws Exception
    {
        InputStream is = new FileInputStream(file);
        is.skip(start);
        // do a computation using the input stream,
        // checking that we don't read more than (end-start) bytes
        System.out.println("Computing the part from " + start + " to " + end);
        Thread.sleep(1000);
        System.out.println("Finished the part from " + start + " to " + end);

        is.close();
        return "Some result";
    }

    // Creates a task that will process the given portion of the file,
    // when executed.
    public Callable<String> processPartTask(final long start, final long end) {
        return new Callable<String>() {
            public String call()
                throws Exception
            {
                return processPart(start, end);
            }
        };
    }

    // Splits the computation into chunks of the given size,
    // creates appropriate tasks and runs them using a 
    // given number of threads.
    public void processAll(int noOfThreads, int chunkSize)
        throws Exception
    {
        int count = (int)((file.length() + chunkSize - 1) / chunkSize);
        java.util.List<Callable<String>> tasks = new ArrayList<Callable<String>>(count);
        for(int i = 0; i < count; i++)
            tasks.add(processPartTask(i * chunkSize, Math.min(file.length(), (i+1) * chunkSize)));
        ExecutorService es = Executors.newFixedThreadPool(noOfThreads);

        java.util.List<Future<String>> results = es.invokeAll(tasks);
        es.shutdown();

        // use the results for something
        for(Future<String> result : results)
            System.out.println(result.get());
    }

    public static void main(String argv[])
        throws Exception
    {
        Split s = new Split(new File(argv[0]));
        s.processAll(8, 1000);
    }
}

这篇关于并发读取文件(java preffered)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆