如何在2个不同的文件中查找重复的行?的Unix [英] How to find duplicate lines across 2 different files? Unix

查看:93
本文介绍了如何在2个不同的文件中查找重复的行?的Unix的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在unix终端上,我们可以使用 diff file1 file2 来查找两个文件之间的差异。有没有类似的命令可以显示两个文件之间的相似性? (如有必要,允许使用许多管道。

From the unix terminal, we can use diff file1 file2 to find the difference between two files. Is there a similar command to show the similarity across 2 files? (many pipes allowed if necessary.

每个文件包含一行带有字符串句子的字符串;对它们进行排序,并使用 sort file1 | uniq删除重复的行

Each file contains a line with a string sentence; they are sorted and duplicate lines removed with sort file1 | uniq.

file1 http://pastebin.com/taRcegVn

file2 http://pastebin.com/2fXeMrHQ

输出将输出出现在两个文件中的行。

And the output should output the lines that appears in both files.

输出 http://pastebin.com/FnjXFshs

我能够使用python这样做,但我认为要放入终端机有点过多:

I am able to use python to do it as such but i think it's a little too much to put into the terminal:

x = set([i.strip() for i in open('wn-rb.dic')])
y = set([i.strip() for i in open('wn-s.dic')])
z = x.intersection(y)
outfile = open('reverse-diff.out')
for i in z:
  print>>outfile, i


推荐答案

正如@tjameson所说的那样在另一个线程中解决。
只是想发布另一个解决方案:
sort file1 file2 | awk'dup [$ 0] ++ == 1'

As @tjameson mentioned it may be solved in another thread. Just would like to post another solution: sort file1 file2 | awk 'dup[$0]++ == 1'


  1. 指的是 awk指南,当一行的模式值很大时,可以获得一些awk
    基础知识是真的,这行将被打印

  1. refer to awk guide to get some awk basics, when the pattern value of a line is true this line will be printed

dup [$ 0]是一个哈希表,其中每个键是输入的每一行,
的原始值为0,并在此行出现后递增,当
再次出现时,该值应为1,因此 dup [$ 0] ++ == 1 是正确的。
然后将打印此行。

dup[$0] is a hash table in which each key is each line of the input, the original value is 0 and increments once this line occurs, when it occurs again the value should be 1, so dup[$0]++ == 1 is true. Then this line is printed.

请注意,这仅在两个文件中没有重复项时有效,如问题中所指定。

Note that this only works when there are not duplicates in either file, as was specified in the question.

这篇关于如何在2个不同的文件中查找重复的行?的Unix的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆