Java - 读取文件并拆分成多个文件 [英] Java - Read file and split into multiple files
问题描述
我有一个文件,我想用Java读取并将此文件拆分为 n
(用户输入)输出文件。以下是我阅读文件的方式:
I have a file which I would like to read in Java and split this file into n
(user input) output files. Here is how I read the file:
int n = 4;
BufferedReader br = new BufferedReader(new FileReader("file.csv"));
try {
String line = br.readLine();
while (line != null) {
line = br.readLine();
}
} finally {
br.close();
}
如何拆分文件 - file.csv
进入 n
文件?
How do I split the file - file.csv
into n
files?
注意 - 由于文件中的条目数是100k的顺序,我无法将文件内容存储到一个数组中,然后将其拆分并保存到多个文件中。
Note - Since the number of entries in the file are of the order of 100k, I can't store the file content into an array and then split it and save into multiple files.
推荐答案
由于文件可能非常大,分割文件本身也可能很大:
Since file can be very large, split files itself could aswell be large:
示例:
源文件大小:5GB
Source File Size: 5GB
Num Splits:5:Destination
Num Splits: 5: Destination
文件大小:每个1GB(5个文件)
File Size: 1GB each (5 files)
没有办法一次性读取这个大的拆分块,即使我们有这样的记忆。基本上对于每个分割,我们可以读取修复大小 byte-array
我们知道在性能和内存方面应该是可行的。
There is no way to read this large split chunk in one go, even if we have such a memory. Basically for each split we can read a fix size byte-array
which we know should be feasible in terms of performance as well memory.
NumSplits:10 MaxReadBytes:8KB
public static void main(String[] args) throws Exception
{
RandomAccessFile raf = new RandomAccessFile("test.csv", "r");
long numSplits = 10; //from user input, extract it from args
long sourceSize = raf.length();
long bytesPerSplit = sourceSize/numSplits ;
long remainingBytes = sourceSize % numSplits;
int maxReadBufferSize = 8 * 1024; //8KB
for(int destIx=1; destIx <= numSplits; destIx++) {
BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+destIx));
if(bytesPerSplit > maxReadBufferSize) {
long numReads = bytesPerSplit/maxReadBufferSize;
long numRemainingRead = bytesPerSplit % maxReadBufferSize;
for(int i=0; i<numReads; i++) {
readWrite(raf, bw, maxReadBufferSize);
}
if(numRemainingRead > 0) {
readWrite(raf, bw, numRemainingRead);
}
}else {
readWrite(raf, bw, bytesPerSplit);
}
bw.close();
}
if(remainingBytes > 0) {
BufferedOutputStream bw = new BufferedOutputStream(new FileOutputStream("split."+(numSplits+1)));
readWrite(raf, bw, remainingBytes);
bw.close();
}
raf.close();
}
static void readWrite(RandomAccessFile raf, BufferedOutputStream bw, long numBytes) throws IOException {
byte[] buf = new byte[(int) numBytes];
int val = raf.read(buf);
if(val != -1) {
bw.write(buf);
}
}
这篇关于Java - 读取文件并拆分成多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!