Java - 读取文件并拆分成多个文件 [英] Java - Read file and split into multiple files

查看:387
本文介绍了Java - 读取文件并拆分成多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,我想用Java读取并将此文件拆分为 n (用户输入)输出文件。以下是我阅读文件的方式:

I have a file which I would like to read in Java and split this file into n (user input) output files. Here is how I read the file:

int n = 4;
BufferedReader br = new BufferedReader(new FileReader("file.csv"));
try {
    String line = br.readLine();

    while (line != null) {
        line = br.readLine();
    }
} finally {
    br.close();
}

如何拆分文件 - file.csv 进入 n 文件?

How do I split the file - file.csv into n files?

注意 - 由于文件中的条目数是100k的顺序,我无法将文件内容存储到一个数组中,然后将其拆分并保存到多个文件中。

Note - Since the number of entries in the file are of the order of 100k, I can't store the file content into an array and then split it and save into multiple files.

推荐答案

由于文件可能非常大,分割文件本身也可能很大:

Since file can be very large, split files itself could aswell be large:

示例:


源文件大小:5GB

Source File Size: 5GB

Num Splits:5:Destination

Num Splits: 5: Destination

文件大小:每个1GB(5个文件)

File Size: 1GB each (5 files)

没有办法一次性读取这个大的拆分块,即使我们有这样的记忆。基本上对于每个分割,我们可以读取修复大小 byte-array 我们知道在性能和内存方面应该是可行的。

There is no way to read this large split chunk in one go, even if we have such a memory. Basically for each split we can read a fix size byte-array which we know should be feasible in terms of performance as well memory.

NumSplits:10 MaxReadBytes:8KB

public static void main(String[] args) throws Exception
    {
        RandomAccessFile raf = new RandomAccessFile("test.csv", "r");
        long numSplits = 10; //from user input, extract it from args
        long sourceSize = raf.length();
        long bytesPerSplit = sourceSize/numSplits ;
        long remainingBytes = sourceSize % numSplits;

        int maxReadBufferSize = 8 * 1024; //8KB
        for(int destIx=1; destIx <= numSplits; destIx++) {
            BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+destIx));
            if(bytesPerSplit > maxReadBufferSize) {
                long numReads = bytesPerSplit/maxReadBufferSize;
                long numRemainingRead = bytesPerSplit % maxReadBufferSize;
                for(int i=0; i<numReads; i++) {
                    readWrite(raf, bw, maxReadBufferSize);
                }
                if(numRemainingRead > 0) {
                    readWrite(raf, bw, numRemainingRead);
                }
            }else {
                readWrite(raf, bw, bytesPerSplit);
            }
            bw.close();
        }
        if(remainingBytes > 0) {
            BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+(numSplits+1)));
            readWrite(raf, bw, remainingBytes);
            bw.close();
        }
            raf.close();
    }

    static void readWrite(RandomAccessFile raf, BufferedOutputStream bw, long numBytes) throws IOException {
        byte[] buf = new byte[(int) numBytes];
        int val = raf.read(buf);
        if(val != -1) {
            bw.write(buf);
        }
    }

这篇关于Java - 读取文件并拆分成多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆