在Java中比较两个csv文件 [英] Comparing two csv files in Java

查看：1684 发布时间：2017/2/24 18:31:05 java csv

本文介绍了在Java中比较两个csv文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们需要比较两个CSV文件。让我们说文件有几行，第二个文件可以有相同的没有行或更多。大多数行可以在两个文件上保持相同。调用最佳方法来执行这两个文件之间的差异，并只读取第二个文件与第一个文件不同的行。处理文件的应用程序是Java。

We have a need to compare two CSV files. Let say file one have a few rows, and second file could have the same no of rows or more. Most of the rows could remain same on both files.Looking for the best approach to do a diff between these two files and read only those rows which has a difference in the second file from the first file. The application processing the file is in Java.

这是最好的方法是什么？

What are the best approaches for this?

注意：如果我们可以知道一行

Note : it would be great if we can know a row is updated, inserted or deleted in the second file.

要求： -

文件1和文件2可能具有相同的记录数，但在file2中更新的值为少数行（记录已更新）

文件2可以删除几行（这被视为记录已删除）

文件2可以添加一些新行/ li>
该列可以被视为记录的主键，这两个文件中都不会更改。

There won't be any duplicate records
File 1 and file 2 could have same no of records with a few rows with updated values in file2 (Records updated)
File 2 could have a few rows removed ( this is treated as record deleted)
File 2 could have a few new rows added ( this is treated as record inserted)
On of the column could be treated a the primary key of the record, that won't change in both the files.

推荐答案

这样做的一个方法是使用java的 设置 界面;将每行作为字符串读取，将其添加到集合中，然后执行 removeAll（） ，第二个集合保留不同的行。这当然假定文件中没有重复的行。

One method for doing this would be to use java's Set interface; read each line as a string, add it to the set, then do a removeAll() with the second set on the first set, thus retaining the rows which differ. This, of course, assumes that there are no duplicate rows in the files.

// using FileUtils to read in the files.
HashSet<String> f1 = new HashSet<String>(FileUtils.readLines("file1.csv"));
HashSet<String> f2 = new HashSet<String>(FileUtils.readLines("file2.csv"));
f1.removeAll(f2); // f1 now contains only the lines which are not in f2

更新

好吧，所以你有一个PK字段。我只是假设你知道如何从你的字符串;使用openCSV或regex或任何你想要的。创建一个实际的 HashMap 而不是如上所述的 HashSet ，使用PK作为键和行作为值。

Okay, so you have a PK field. I'll just assume you know how to get that from your string; use openCSV or regex or whatever you want. Make an actual HashMap instead of a HashSet as above, use the PK as the key and the row as the value.

HashMap<String, String> f1 = new HashMap<String, String>();
HashMap<String, String> f2 = new HashMap<String, String>();
// read f1, f2; use PK field as the key
List<String> deleted = new ArrayList<String>();
List<String> updated = new ArrayList<String>();
for(Map.Entry<String, String> entry : f1.keySet()) {
    if(!f2.containsKey(entry.getKey()) {
        deleted.add(entry.getValue());
    } else {
        if(!f2.get(entry.getKey().equals(f1.getValue())) {
            updated.add(f1.getValue());
        }
    }
}
for(String key : f1.keySet()) {
    f2.remove(key);
}
// f2 now contains only "new" rows

这篇关于在Java中比较两个csv文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Java中比较两个csv文件 [英] Comparing two csv files in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

在Java中比较两个csv文件 [英] Comparing two csv files in Java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭