并发读取文件(首选java) [英] Concurrent reading of a File (java preferred)

查看:41
本文介绍了并发读取文件(首选java)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需要数小时才能处理的大文件.所以我正在考虑尝试估计块并并行读取块.

I have a large file that takes multiple hours to process. So I am thinking of trying to estimate chunks and read the chunks in parallel.

是否可以同时读取单个文件?我已经查看了 RandomAccessFilenio.FileChannel 但基于其他帖子我不确定这种方法是否有效.

Is it possible to to concurrent read on a single file? I have looked at both RandomAccessFile as well as nio.FileChannel but based on other posts am not sure if this approach would work.

推荐答案

这里最重要的问题是您的案例中的瓶颈是什么.

如果瓶颈是您的磁盘 IO,那么您在软件部分无能为力.并行计算只会让事情变得更糟,因为同时从不同部分读取文件会降低磁盘性能.

If the bottleneck is your disk IO, then there isn't much you can do at the software part. Parallelizing the computation will only make things worse, because reading the file from different parts simultaneously will degrade disk performance.

如果瓶颈是处理能力,并且您有多个 CPU 内核,那么您可以利用启动多个线程来处理文件的不同部分.您可以安全地创建多个 InputStreamReader 以并行读取文件的不同部分(只要您不超过操作系统的数量限制打开的文件).您可以将工作分成多个任务并并行运行,如下例所示:

If the bottleneck is processing power, and you have multiple CPU cores, then you can take an advantage of starting multiple threads to work on different parts of the file. You can safely create several InputStreams or Readers to read different parts of the file in parallel (as long as you don't go over your operating system's limit for the number of open files). You could separate the work into tasks and run them in parallel, like in this example:

import java.io.*;
import java.util.*;
import java.util.concurrent.*;

public class Split {
    private File file;

    public Split(File file) {
        this.file = file;
    }

    // Processes the given portion of the file.
    // Called simultaneously from several threads.
    // Use your custom return type as needed, I used String just to give an example.
    public String processPart(long start, long end)
        throws Exception
    {
        InputStream is = new FileInputStream(file);
        is.skip(start);
        // do a computation using the input stream,
        // checking that we don't read more than (end-start) bytes
        System.out.println("Computing the part from " + start + " to " + end);
        Thread.sleep(1000);
        System.out.println("Finished the part from " + start + " to " + end);

        is.close();
        return "Some result";
    }

    // Creates a task that will process the given portion of the file,
    // when executed.
    public Callable<String> processPartTask(final long start, final long end) {
        return new Callable<String>() {
            public String call()
                throws Exception
            {
                return processPart(start, end);
            }
        };
    }

    // Splits the computation into chunks of the given size,
    // creates appropriate tasks and runs them using a 
    // given number of threads.
    public void processAll(int noOfThreads, int chunkSize)
        throws Exception
    {
        int count = (int)((file.length() + chunkSize - 1) / chunkSize);
        java.util.List<Callable<String>> tasks = new ArrayList<Callable<String>>(count);
        for(int i = 0; i < count; i++)
            tasks.add(processPartTask(i * chunkSize, Math.min(file.length(), (i+1) * chunkSize)));
        ExecutorService es = Executors.newFixedThreadPool(noOfThreads);

        java.util.List<Future<String>> results = es.invokeAll(tasks);
        es.shutdown();

        // use the results for something
        for(Future<String> result : results)
            System.out.println(result.get());
    }

    public static void main(String argv[])
        throws Exception
    {
        Split s = new Split(new File(argv[0]));
        s.processAll(8, 1000);
    }
}

这篇关于并发读取文件(首选java)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆