从CSV文件中删除重复的行,而无需写入新文件 [英] Remove duplicate rows from csv file without write a new file

查看:350
本文介绍了从CSV文件中删除重复的行,而无需写入新文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我现在的代码:

File file1 = new File("file1.csv");
File file2 = new File("file2.csv");
HashSet<String> f1 = new HashSet<>(FileUtils.readLines(file1));
HashSet<String> f2 = new HashSet<>(FileUtils.readLines(file2));
f2.removeAll(f1);

使用removeAll()我从file1删除了file2中所有重复的文件,但是现在我想避免创建新的csv文件来优化该过程.只想从file2中删除重复的行.

With removeAll() I remove all duplicates wich are in file2 from file1, but now I want to avoid to create a new csv file to optimize the process. Just want to delete from file2 the duplicate rows.

这是否可行,或者我必须创建一个新文件?

Is this possible or I have to create a new file?

推荐答案

现在我要避免创建新的csv文件来优化流程.

now I want to avoid to create a new csv file to optimize the process.

好吧,当然可以,如果您不介意丢失文件,就可以这样做!

Well, sure, you can do that... If you don't mind possibly losing the file!

不要这样做.

由于您使用的是Java 7,因此使用java.nio.file .这是一个示例:

And since you use Java 7, well, use java.nio.file. Here's an example:

final Path file1 = Paths.get("file1.csv");
final Path file2 = Paths.get("file2.csv");
final Path tmpfile = file2.resolveSibling("file2.csv.new");

final Set<String> file1Lines 
    = new HashSet<>(Files.readAllLines(file1, StandardCharsets.UTF_8));

try (
    final BufferedReader reader = Files.newBufferedReader(file2,
        StandardCharsets.UTF_8);
    final BufferedWriter writer = Files.newBufferedWriter(tmpfile,
        StandardCharsets.UTF_8, StandardOpenOption.CREATE_NEW);
) {
    String line;
    while ((line = reader.readLine()) != null)
        if (!file1Lines.contains(line)) {
            writer.write(line);
            writer.newLine();
        }
}

try {
    Files.move(tmpfile, file2, StandardCopyOption.REPLACE_EXISTING,
        StandardCopyOption.ATOMIC_MOVE);
} catch (AtomicMoveNotSupportedException ignored) {
    Files.move(tmpfile, file2, StandardCopyOption.REPLACE_EXISTING);
}

如果您使用Java 8,则可以改用以下try-with-resources块:

If you use Java 8, you can use this try-with-resources block instead:

try (
    final Stream<String> stream = Files.lines(file2, StandardCharsets.UTF_8);
    final BufferedWriter writer = Files.newBufferedWriter(tmpfile,
        StandardCharsets.UTF_8, StandardOpenOption.CREATE_NEW);
) {
    stream.filter(line -> !file1Lines.contains(line))
        .forEach(line -> { writer.write(line); writer.newLine(); });
}

这篇关于从CSV文件中删除重复的行,而无需写入新文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆