在Java中对txt文件进行排序的最佳方法 [英] Optimal way to sort a txt file in Java

查看:108
本文介绍了在Java中对txt文件进行排序的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个正在使用opencsv库处理的CSV文件.这样我就可以阅读每一行.我需要做的特定转换要求我先对文件进行排序,然后再使用Java文件的主要部分对其进行遍历.

I've got a CSV file that I'm processing using the opencsv library. So I can read in each line. The particular transformation I need to do requires me to sort that file first before I run through it with the main portion of my java file.

例如

5423, blah2, blah
5323, blah3, blah
5423, blah4, blah
5444, blah5, blah
5423, blah6, blah

应该成为

5323, blah3, blah
5423, blah2, blah
5423, blah4, blah
5423, blah6, blah
5444, blah5, blah

等.

我需要这样做的原因是我将所有具有相同ID的行组合在一起,并将其输出到新文件中.

The reason i need to do this is I'm combining all rows with the same id and outputting them to a new file.

有什么问题:

  1. 使用opencsv库读取csv的每一行

  1. Read each line of the csv with the opencsv library

将它们添加到二维数组

对此进行某种排序

遍历已排序的数组并输出到文件.

Loop through sorted array and output to file.

关于此的其他想法以及对数据进行排序的最佳方法是什么?

Any other ideas on this and what is the best way to sort the data?

我的Java上有些生锈.

Bit rusty on my Java.

更新: 澄清最终输出

它看起来像:

5323, blah3, blah
5423, blah2!!blah4!!blah6, blah
5444, blah5, blah

这是我正在做的非常简化的版本.实际上,JBase系统中的多选项字段需要使用它.这是要求的文件格式.

This is a very simplified version of what I'm doing. It actually is needed for multi option fields in a JBase system. This is the requested file format.

原始文件中有超过100,000行.

There are over a 100,000 lines in the original file.

这将运行多次,并且运行速度对我来说很重要.

This will be run more than once and the speed it runs is important to me.

推荐答案

为完成最新请求,我强烈建议使用

To accomplish the most recent request, I would highly suggest using Multimap in the google collection. Your code would look like:

CSVReader reader = ...;
CSVWriter writer = ...;

Multimap<String, String> results = TreeMultimap.create();

// read the file
String[] line;
for ((line = reader.readNext()) != null) {
    results.put(line[0], line[1]);
}

// output the file
Map<String, Collection<String>> mapView = results.asMap();
for (Map.Entry<String, Collection<String> entry : mapView.entries()) {
    String[] nextLine = new String[2];
    nextLine[0] = entry.getKey();
    nextLine[1] = formatCollection(entry.getValue());
    writer.writeNext(nextLine);
}

您需要使用"blah\n"作为线路提供者.如果您关心速度,而不是关心条目的排序,那么您也应该以HashMultimap为基准.

You need to use "blah\n" as your line ender. If you care about speed, but not so much about having the entries sorted, you should benchmark against HashMultimap as well.

我先前的答案:

最直接的方法是在* nix(例如Linux和Mac OS)中使用sort命令,例如

The most straightford way is to use the sort command in *nix (e.g. Linux and Mac OS), like

sort -n myfile.csv

Windows也有一个sort命令,但是会按字母顺序对行进行排序(即"5"将放置在"13"行之前).

Windows has a sort command as well, but would sort the lines alphabetically (i.e. '5,' would be placed before '13,' lines).

但是,建议的解决方案没有任何问题.除了构造数组并对其进行排序之外,您还可以仅使用

However, there is nothing wrong with the suggested solution. Instead of constructing the array and sorting it, you can also just use TreeSet.

添加有关Windows的注释.

adding a note about Windows.

这篇关于在Java中对txt文件进行排序的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆