通过复杂的比较合并多个排序的csv文件 [英] Merging multiple sorted csv files with complex comparison

查看:253
本文介绍了通过复杂的比较合并多个排序的csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有要排序的,csv,文件的列表,我想对其进行排序并合并到输出文件中.

I have list of sorted,csv,files that I want to sort and merge into output file.

我不想对字符串进行简单的比较,但是要对每个值对应的类型映射进行相应的比较,例如:

I don't want to do a simple comparison of strings, but comparison accordingly to map of types that i have for every value, e.g:

其中一行:
2011年1月15日,纽约,大卫·赖文

One of the lines:
1, 15/12/2011, David Raiven, New York

在类型映射中,我有这个:第一列-长,第二个日期,第三个字符串......

In the type map I have this: first column - long, second- date, third-string,...

因此,比较器应相应地比较值.

So the comparator should compare values accordingly.

我如何才能以最高的效率做到这一点?
PriorityQueue?树图?

How can i do it with highest efficiency?
PriorityQueue? TreeMap?

我不想使用第三方库或排序器.
输入文件很大.

I prefer not to use 3rd party libraries or sorters.
The input file is enormous.

推荐答案

为每个CSV文件创建一个Readers/InputStreams数组(如果需要,可以选择Collection).

Create an array (or, if you prefer, a Collection) of Readers/InputStreams, one for each CSV file.

类似于@JustinKSU的想法,创建一个TreeMap,其键是CSV文件中的一行.传递一个自定义的Comparator,该自定义的impl按long,Date等排序.该值是数组/集合中哪个文件的索引(可能是Integer,如果您的Collection是Map,则可能是文件名).

Similar to @JustinKSU idea, create a TreeMap, where the key is one line from the CSV file. Pass a custom Comparator, your custom impl that sorts by long, Date etc. The value is the index (probably an Integer, could be the filename if your Collection is a Map) of which file in your array/Collection.

通过读取每个文件的第一行来播种TreeMap.

Seed the TreeMap by reading the first line from each file.

使用TreeMap.pollFirstEntry()删除最下面的行,然后将键(该行)写入Writer/OutputStream.使用该值从相应的文件中再读取一行(检查EOF)并将其添加到TreeMap中.

Remove the lowest line using TreeMap.pollFirstEntry(), and write the key (the line) to a Writer/OutputStream. Use the value to read one more line from the appropriate file (checking for EOF) and add that into the TreeMap.

重复直到TreeMap为空.关闭所有内容.

Repeat until TreeMap is empty. Close everything.

编辑-在下面添加了源代码

Edit - Added Source Code below

并且请注意,这仅在输入文件已经排序的情况下有效! (按照问题中的说明)

And Note, this only works if the input files are already sorted! (As was specified in the question)

public void mergeSort(File[] inFiles, File outFile, Comparator<String> comparator) throws IOException  {

      try {
         BufferedReader[] readers = new BufferedReader[inFiles.length];
         PrintWriter writer = new PrintWriter(outFile);
         TreeMap<String, Integer> treeMap = new TreeMap<String, Integer>(
               comparator);

         // read first line of each file. We don't check for EOF here, probably should
         for (int i = 0; i < inFiles.length; i++) {
            readers[i] = new BufferedReader(new FileReader(inFiles[i]));
            String line = readers[i].readLine();
            treeMap.put(line, Integer.valueOf(i));
         }

         while (!treeMap.isEmpty()) {
            Map.Entry<String, Integer> nextToGo = treeMap.pollFirstEntry();
            int fileIndex = nextToGo.getValue().intValue();
            writer.println(nextToGo.getKey());

            String line = readers[fileIndex].readLine();
            if (line != null)
               treeMap.put(line, Integer.valueOf(fileIndex));
         }
      }
      finally {
         // close everything here...
      }
   }

这篇关于通过复杂的比较合并多个排序的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆