在Java中比较两个csv文件 [英] Comparing two csv files in Java

查看:1684
本文介绍了在Java中比较两个csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们需要比较两个CSV文件。让我们说文件有几行,第二个文件可以有相同的没有行或更多。大多数行可以在两个文件上保持相同。调用最佳方法来执行这两个文件之间的差异,并只读取第二个文件与第一个文件不同的行。处理文件的应用程序是Java。

We have a need to compare two CSV files. Let say file one have a few rows, and second file could have the same no of rows or more. Most of the rows could remain same on both files.Looking for the best approach to do a diff between these two files and read only those rows which has a difference in the second file from the first file. The application processing the file is in Java.

这是最好的方法是什么?

What are the best approaches for this?

注意:如果我们可以知道一行

Note : it would be great if we can know a row is updated, inserted or deleted in the second file.

要求: -



  1. 文件1和文件2可能具有相同的记录数,但在file2中更新的值为少数行(记录已更新)

  2. 文件2可以删除几行(这被视为记录已删除)

  3. 文件2可以添加一些新行/ li>
  4. 该列可以被视为记录的主键,这两个文件中都不会更改。

  1. There won't be any duplicate records
  2. File 1 and file 2 could have same no of records with a few rows with updated values in file2 (Records updated)
  3. File 2 could have a few rows removed ( this is treated as record deleted)
  4. File 2 could have a few new rows added ( this is treated as record inserted)
  5. On of the column could be treated a the primary key of the record, that won't change in both the files.


推荐答案

这样做的一个方法是使用java的 设置 界面;将每行作为字符串读取,将其添加到集合中,然后执行 removeAll() ,第二个集合保留不同的行。这当然假定文件中没有重复的行。

One method for doing this would be to use java's Set interface; read each line as a string, add it to the set, then do a removeAll() with the second set on the first set, thus retaining the rows which differ. This, of course, assumes that there are no duplicate rows in the files.

// using FileUtils to read in the files.
HashSet<String> f1 = new HashSet<String>(FileUtils.readLines("file1.csv"));
HashSet<String> f2 = new HashSet<String>(FileUtils.readLines("file2.csv"));
f1.removeAll(f2); // f1 now contains only the lines which are not in f2

更新

好吧,所以你有一个PK字段。我只是假设你知道如何从你的字符串;使用openCSV或regex或任何你想要的。创建一个实际的 HashMap 而不是如上所述的 HashSet ,使用PK作为键和行作为值。

Okay, so you have a PK field. I'll just assume you know how to get that from your string; use openCSV or regex or whatever you want. Make an actual HashMap instead of a HashSet as above, use the PK as the key and the row as the value.

HashMap<String, String> f1 = new HashMap<String, String>();
HashMap<String, String> f2 = new HashMap<String, String>();
// read f1, f2; use PK field as the key
List<String> deleted = new ArrayList<String>();
List<String> updated = new ArrayList<String>();
for(Map.Entry<String, String> entry : f1.keySet()) {
    if(!f2.containsKey(entry.getKey()) {
        deleted.add(entry.getValue());
    } else {
        if(!f2.get(entry.getKey().equals(f1.getValue())) {
            updated.add(f1.getValue());
        }
    }
}
for(String key : f1.keySet()) {
    f2.remove(key);
}
// f2 now contains only "new" rows

这篇关于在Java中比较两个csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆