比较两个文本文件的最快方法是什么,不将移动的行计算为不同 [英] What is the fastest way to compare two text files, not counting moved lines as different

查看:160
本文介绍了比较两个文本文件的最快方法是什么,不将移动的行计算为不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个文件非常大,每个文件50000行。我需要比较这两个文件并确定更改。然而,问题是如果一条线存在于不同的位置,它不应该显示为不同。

I have two files which are very large in size say 50000 lines each. I need to compare these two files and identify the changes. However, the catch is if a line is present at different position, it should not be shown as different.

例如,考虑这个

文件A.txt

For eg, consider this
File A.txt

xxxxx
yyyyy
zzzzz    

文件B.txt

zzzzz
xxxx
yyyyy  

因此,如果这是文件的内容。我的代码应该输出为xxxx(或者xxxx和xxxxx)。

So if this is the content of the file. My code should give the output as xxxx(or both xxxx and xxxxx).

当然最简单的方法是将文件的每一行存储在

Ofcourse the easiest way would be storing each line of the file in a

List< String>

并与其他人比较

List< String>.

但这似乎需要花费很多时间。我也尝试在java中使用DiffUtils。但它不会将不同行号中的行识别为相同。那么有没有其他算法可以帮助我?

But this seems to be taking a lot of time. I have also tried using the DiffUtils in java. But it doesnt recognize the lines present in diferent line numbers as same. So is there any other algorithm that might help me?

推荐答案

可能使用 Set 是最简单的方法:

probably using Set is the easiest way:

Set<String> set1 = new HashSet<String>(FileUtils.readLines(file1));

Set<String> set2 = new HashSet<String>(FileUtils.readLines(file2));


Set<String> similars = new HashSet<String>(set1);

similars.retainAll(set2);

set1.removeAll(similars); //now set1 contains distinct lines in file1
set2.removeAll(similars); //now set2 contains distinct lines in file2
System.out.println(set1); //prints distinct lines in file1;
System.out.println(set2); //prints distinct lines in file2

这篇关于比较两个文本文件的最快方法是什么,不将移动的行计算为不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆