Java - 按块读取文本文件 [英] Java - Read text file by chunks

查看:167
本文介绍了Java - 按块读取文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在不同的块中读取日志文件以使其具有多线程。该应用程序将在具有多个硬盘的服务器端环境中运行。
读入块后,应用程序将按每个块的每行处理行。

I want to read a log file in different chunks to make it multi threaded. The application is going to run in a serverside environment with multiple hard disks. After reading into chunks the app is going to process line per line of every chunk.

我已经用缓冲读取器完成了每个文件行的读取我可以使用RandomAccessFile和MappedByteBuffer一起制作我的文件块,但是将这两者结合起来并不容易。

I've accomplished the reading of every file line line with a bufferedreader and I can make chunks of my file with RandomAccessFile in combination with MappedByteBuffer, but combining these two isn't easy.

问题是块正在切入我的大块的最后一行。我从来没有我的块的最后一行,所以处理这个最后的日志是不可能的。我正试图找到一种方法将我的文件切割成可变长度的块,尊重行的结尾。

The problem is that the chunk is just cutting into the last line of my chunk. I never have the whole last line of my block so processing this last log-line is impossible. I'm trying to find a way to cut my file into variable-length chunks respecting the end of the lines.

有没有人有这样做的代码?

Does anyone have a code for doing this?

推荐答案

在开始处理块之前,您可以在文件中找到位于行边界的偏移量。通过将文件大小除以块数来开始偏移,并搜索直到找到行边界。然后将这些偏移量提供给多线程文件处理器。下面是一个完整的示例,它使用可用处理器的数量来计算块数:

You could find offsets in the file that are at line boundaries before you start processing the chunks. Start with the offset by dividing the file size by the chunk number and seek until you find a line boundary. Then feed those offsets into your multi-threaded file processor. Here's a complete example that uses the number of available processors for the number of chunks:

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class ReadFileByChunks {
    public static void main(String[] args) throws IOException {
        int chunks = Runtime.getRuntime().availableProcessors();
        long[] offsets = new long[chunks];
        File file = new File("your.file");

        // determine line boundaries for number of chunks
        RandomAccessFile raf = new RandomAccessFile(file, "r");
        for (int i = 1; i < chunks; i++) {
            raf.seek(i * file.length() / chunks);

            while (true) {
                int read = raf.read();
                if (read == '\n' || read == -1) {
                    break;
                }
            }

            offsets[i] = raf.getFilePointer();
        }
        raf.close();

        // process each chunk using a thread for each one
        ExecutorService service = Executors.newFixedThreadPool(chunks);
        for (int i = 0; i < chunks; i++) {
            long start = offsets[i];
            long end = i < chunks - 1 ? offsets[i + 1] : file.length();
            service.execute(new FileProcessor(file, start, end));
        }
        service.shutdown();
    }

    static class FileProcessor implements Runnable {
        private final File file;
        private final long start;
        private final long end;

        public FileProcessor(File file, long start, long end) {
            this.file = file;
            this.start = start;
            this.end = end;
        }

        public void run() {
            try {
                RandomAccessFile raf = new RandomAccessFile(file, "r");
                raf.seek(start);

                while (raf.getFilePointer() < end) {
                    String line = raf.readLine();
                    if (line == null) {
                        continue;
                    }

                    // do what you need per line here
                    System.out.println(line);
                }

                raf.close();
            } catch (IOException e) {
                // deal with exception
            }
        }
    }
}

这篇关于Java - 按块读取文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆