合并/合并两个表快速的Linux命令行 [英] merge/join two tables fast linux command line
问题描述
让我们说我有两个相对较大的制表符分隔文件file1.txt,file2.txt.
Let us say I have two relatively large tab-delimited files file1.txt, file2.txt.
file1.txt
id\tcity\tcar\ttype\tmodel
file2.txt
id\tname\trating
让我们假设file1.txt有2000个唯一ID,因此有2000个唯一行,而file2.txt只有1000个唯一行,因此有1000个唯一ID.有没有办法合并两个表?
Let us suppose that file1.txt has 2000 unique ids, and therefore 2000 unique rows, and file2.txt has only 1000 unique rows, and therefore 1000 unique ids. Is there a way to merge the two tables?
情况1.在file1.txt中按ID合并它们,当file2.txt中没有ID时,将填写NA.
Case 1. merge them by id in file1.txt, where when there is no id in file2.txt NAs would be filled in.
案例2.通过在file2.txt中的id合并它们,在这种情况下,只有file2.txt中的id会与file1.txt和file2.txt中的字段一起打印出来.
Case2. merge them by id in file2.txt, where when only the ids in file2.txt will be printed out with the fields in file1.txt and file2.txt.
注意:合并的新文件也应该是制表符分隔的文件,并且还带有头文件. 笔记2.我也很感谢在没有标题的情况下如何做的建议.
Note: the merged new files should also be tab-delimited file, with a header file as well. Note2. I'd also appreciate suggestions on how to do it when there is no header as well.
谢谢!
推荐答案
join -j 1 <(sort file1.txt) <(sort file2.txt)
仅使用标准的unix工具执行案例2"方法.当然,如果文件已排序,则可以删除排序.
Does your 'case 2' approach with only standard unix tools. Of course, if the files are sorted, you can drop the sort.
如果包含标头,则可能依靠数字ID将连接的标头排序到顶部:
If you included the headers, you might rely on the ids being numerical for sorting the joined header to the top:
join -j 1 <(sort file1.txt) <(sort file2.txt) | sort -n
使用
-
file1.txt
file1.txt
id city car type model
1 york subaru impreza king
2 kampala toyota corolla sissy
3 luzern chrysler gravity falcon
file2.txt
file2.txt
id name rating
3 zanzini PG
2 tara X
输出:
output:
id city car type model name rating
2 kampala toyota corolla sissy tara X
3 luzern chrysler gravity falcon zanzini PG
PS 要保留TAB分隔符,请传递-t
选项:
PS To preserve the TAB separator character, pass the -t
option:
join -t' ' ...
在SO上很难显示''包含TAB字符.用 ^ V TAB 键入(例如,以bash格式)
It's kind of hard to show on SO that ' ' contained a TAB character. Type it with ^VTAB (e.g. in bash)
这篇关于合并/合并两个表快速的Linux命令行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!