文件的交集 [英] Intersection of files

查看：160 发布时间：2015/11/30 15:11:46 algorithm file unix intersection

本文介绍了文件的交集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我两个大文件（27K线和450K线）。它们看起来有点像：

 文件1：
1 2 5
3 2 B 7
6 3的C 8
...

文件2：
4 2 C 5
7 2 B 7
6 8 B 8
7 7 F 9
...

我想从两个文件中，第3列是在这两个文件（注意线，A和F被排除在外）的行：

 输出：
3 2 B 7
6 3的C 8
4 2 C 5
7 2 B 7
6 8 B 8

最新最好的方法是什么？

解决方案

 的awk'{打印$ 3}'文件1 |排序| uniq的＆GT; file1col3
awk的'{打印$ 3}'文件2 |排序| uniq的＆GT; file2col3
grep的-Fx -f file1col3 file2col3 | awk的'{打印\\ W + \\ W +$ 1\\ W +}'＆GT; col3regexp
egrep的-xh -f col3regexp文件1文件2

抓斗所有的独特的第3列在这两个文件中，相交他们（使用的grep -F ），打印出一堆普通的前pressions将匹配的你想列，然后使用 egrep的从两个文件提取出来。

I two large files (27k lines and 450k lines). They look sort of like:

File1:
1 2 A 5
3 2 B 7
6 3 C 8
...

File2:
4 2 C 5
7 2 B 7
6 8 B 8
7 7 F 9
...

I want the lines from both files in which the 3rd column is in both files (note lines with A and F were excluded):

OUTPUT:
3 2 B 7
6 3 C 8
4 2 C 5
7 2 B 7
6 8 B 8

whats the best way?

解决方案

awk '{print $3}' file1 | sort | uniq > file1col3
awk '{print $3}' file2 | sort | uniq > file2col3
grep -Fx -f file1col3 file2col3 | awk '{print "\\w+ \\w+ " $1 " \\w+"}' > col3regexp
egrep -xh -f col3regexp file1 file2

Grabs all the unique column 3's in the two files, intersects them (using grep -F), prints a bunch of regular expressions that will match the columns you want, then uses egrep to extract them from the two files.

这篇关于文件的交集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

文件的交集 [英] Intersection of files

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

文件的交集 [英] Intersection of files

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭