通过复杂的比较合并多个排序的csv文件 [英] Merging multiple sorted csv files with complex comparison
问题描述
我有要排序的,csv,文件的列表,我想对其进行排序并合并到输出文件中.
I have list of sorted,csv,files that I want to sort and merge into output file.
我不想对字符串进行简单的比较,但是要对每个值对应的类型映射进行相应的比较,例如:
I don't want to do a simple comparison of strings, but comparison accordingly to map of types that i have for every value, e.g:
其中一行:
2011年1月15日,纽约,大卫·赖文
One of the lines:
1, 15/12/2011, David Raiven, New York
在类型映射中,我有这个:第一列-长,第二个日期,第三个字符串......
In the type map I have this: first column - long, second- date, third-string,...
因此,比较器应相应地比较值.
So the comparator should compare values accordingly.
我如何才能以最高的效率做到这一点?
PriorityQueue?树图?
How can i do it with highest efficiency?
PriorityQueue? TreeMap?
我不想使用第三方库或排序器.
输入文件很大.
I prefer not to use 3rd party libraries or sorters.
The input file is enormous.
推荐答案
为每个CSV文件创建一个Readers/InputStreams数组(如果需要,可以选择Collection).
Create an array (or, if you prefer, a Collection) of Readers/InputStreams, one for each CSV file.
类似于@JustinKSU的想法,创建一个TreeMap,其键是CSV文件中的一行.传递一个自定义的Comparator,该自定义的impl按long,Date等排序.该值是数组/集合中哪个文件的索引(可能是Integer,如果您的Collection是Map,则可能是文件名).
Similar to @JustinKSU idea, create a TreeMap, where the key is one line from the CSV file. Pass a custom Comparator, your custom impl that sorts by long, Date etc. The value is the index (probably an Integer, could be the filename if your Collection is a Map) of which file in your array/Collection.
通过读取每个文件的第一行来播种TreeMap.
Seed the TreeMap by reading the first line from each file.
使用TreeMap.pollFirstEntry()删除最下面的行,然后将键(该行)写入Writer/OutputStream.使用该值从相应的文件中再读取一行(检查EOF)并将其添加到TreeMap中.
Remove the lowest line using TreeMap.pollFirstEntry(), and write the key (the line) to a Writer/OutputStream. Use the value to read one more line from the appropriate file (checking for EOF) and add that into the TreeMap.
重复直到TreeMap为空.关闭所有内容.
Repeat until TreeMap is empty. Close everything.
编辑-在下面添加了源代码
Edit - Added Source Code below
并且请注意,这仅在输入文件已经排序的情况下有效! (按照问题中的说明)
And Note, this only works if the input files are already sorted! (As was specified in the question)
public void mergeSort(File[] inFiles, File outFile, Comparator<String> comparator) throws IOException {
try {
BufferedReader[] readers = new BufferedReader[inFiles.length];
PrintWriter writer = new PrintWriter(outFile);
TreeMap<String, Integer> treeMap = new TreeMap<String, Integer>(
comparator);
// read first line of each file. We don't check for EOF here, probably should
for (int i = 0; i < inFiles.length; i++) {
readers[i] = new BufferedReader(new FileReader(inFiles[i]));
String line = readers[i].readLine();
treeMap.put(line, Integer.valueOf(i));
}
while (!treeMap.isEmpty()) {
Map.Entry<String, Integer> nextToGo = treeMap.pollFirstEntry();
int fileIndex = nextToGo.getValue().intValue();
writer.println(nextToGo.getKey());
String line = readers[fileIndex].readLine();
if (line != null)
treeMap.put(line, Integer.valueOf(fileIndex));
}
}
finally {
// close everything here...
}
}
这篇关于通过复杂的比较合并多个排序的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!