如何在Java中合并CSV文件 [英] How to merge CSV files in Java

查看:2245
本文介绍了如何在Java中合并CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的第一个csv文件看起来像这样包含标题(标题仅包含在顶部,而不是每个条目后):

My first csv file looks like this with header included(header is included only at the top not after every entry):

NAME,SURNAME,AGE
Fred,Krueger,Unknown
.... n records


$ b b

我的第二个文件可能如下所示:

My second file might look like this:

NAME,MIDDLENAME,SURNAME,AGE
Jason,Noname,Scarry,16
.... n records with this header template

合并文件应如下所示:

NAME,SURNAME,AGE,MIDDLENAME
Fred,Krueger,Unknown,
Jason,Scarry,16,Noname
....

UPDATE:

上面的CSV被做得更小,所以我可以说明我想要实现的,实际上CSV文件是在这之前生成的一步(合并),可以最多100列

Above CSV were made smaller so I can illustrate what I want to achieve, in reality CSV files are generated one step before this(merge) and can be up to 100 columns

有没有人知道我该怎么做?我会感谢任何帮助

Does anyone have any idea how can I do this? I'd appreciate any help

推荐答案

我将创建一个更大的格式模型和这个类的实例的集合),并实现两个解析器,一个用于第一个,一个用于第二个模型。为两个csv文件的所有行创建记录,并实现一个writer以正确的格式输出csv。简介:

I'd create a model for the 'bigger' format (a simple class with four fields and a collection for instances of this class) and implemented two parsers, one for the first, one for the second model. Create records for all rows of both csv files and implement a writer to output the csv in the correct format. IN brief:

 public void convert(File output, File...input) {

   List<Record> records = new ArrayList<Record>();
   for (File file:input) {
     if (input.isThreeColumnFormat()) {
        records.addAll(ThreeColumnFormatParser.parse(file));
     } else {
        records.addAll(FourColumnFormatParser.parse(file));
     }
   }
   CsvWriter.write(output, records);
 }







From your comment I see, that you a lot of different csv formats with some common columns.

您可以为各种csv文件中的任何行定义模型,如下所示:

You could define the model for any row in the various csv files like this:

public class Record {
  Object id; // some sort of unique identifier
  Map<String, String> values; // all key/values of a single row
  public Record(Object id) {this.id=id;}
  public void put(String key, String value){
    values.put(key, value);
  }
  public void get(String key) {
    values.get(key);
  }
}

要解析任何文件,将列标题添加到全局密钥库(稍后将需要输出),然后为所有行创建记录,如:

For parsing any file you would first read the header and add the column headers to a global keystore (will be needed later on for outputting), then create records for all rows, like:

//...
List<Record> records = new ArrayList<Record>()

for (File file:getAllFiles()) {
  List<String> keys = getColumnsHeaders(file);
  KeyStore.addAll(keys);  // the store is a Set
  for (String line:file.getLines()) {
    String[] values = line.split(DELIMITER);
    Record record = new Record(file.getName()+i);  // as an example for id
    for (int i = 0; i < values.length; i++) {
      record.put(keys.get(i), values[i]);
    }
    records.add(record);
  }
}
// ...

keystore已经使用了列标题名称,我们可以遍历所有记录的集合,获取所有键的所有值(如果此记录的文件不存在,则获取 null 使用键),组装csv行并将所有内容写入一个新文件。

Now the keystore has all used column header names and we can iterate over the collection of all records, get all values for all keys (and get null if the file for this record didn't use the key), assemble the csv lines and write everything to a new file.

这篇关于如何在Java中合并CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆