合并/合并两个表快速的Linux命令行 [英] merge/join two tables fast linux command line

查看:562
本文介绍了合并/合并两个表快速的Linux命令行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们说我有两个相对较大的制表符分隔文件file1.txt,file2.txt.

Let us say I have two relatively large tab-delimited files file1.txt, file2.txt.

file1.txt
id\tcity\tcar\ttype\tmodel

file2.txt 
id\tname\trating

让我们假设file1.txt有2000个唯一ID,因此有2000个唯一行,而file2.txt只有1000个唯一行,因此有1000个唯一ID.有没有办法合并两个表?

Let us suppose that file1.txt has 2000 unique ids, and therefore 2000 unique rows, and file2.txt has only 1000 unique rows, and therefore 1000 unique ids. Is there a way to merge the two tables?

情况1.在file1.txt中按ID合并它们,当file2.txt中没有ID时,将填写NA.

Case 1. merge them by id in file1.txt, where when there is no id in file2.txt NAs would be filled in.

案例2.通过在file2.txt中的id合并它们,在这种情况下,只有file2.txt中的id会与file1.txt和file2.txt中的字段一起打印出来.

Case2. merge them by id in file2.txt, where when only the ids in file2.txt will be printed out with the fields in file1.txt and file2.txt.

注意:合并的新文件也应该是制表符分隔的文件,并且还带有头文件. 笔记2.我也很感谢在没有标题的情况下如何做的建议.

Note: the merged new files should also be tab-delimited file, with a header file as well. Note2. I'd also appreciate suggestions on how to do it when there is no header as well.

谢谢!

推荐答案

join -j 1 <(sort file1.txt) <(sort file2.txt)

仅使用标准的unix工具执行案例2"方法.当然,如果文件已排序,则可以删除排序.

Does your 'case 2' approach with only standard unix tools. Of course, if the files are sorted, you can drop the sort.

如果包含标头,则可能依靠数字ID将连接的标头排序到顶部:

If you included the headers, you might rely on the ids being numerical for sorting the joined header to the top:

join -j 1 <(sort file1.txt) <(sort file2.txt) | sort -n

使用

  • file1.txt

  • file1.txt

id  city    car type    model
1   york    subaru  impreza king
2   kampala toyota  corolla sissy
3   luzern  chrysler    gravity falcon

  • file2.txt

  • file2.txt

    id  name    rating
    3   zanzini PG
    2   tara    X
    

  • 输出:

  • output:

    id  city    car type    model   name    rating
    2   kampala toyota  corolla sissy   tara    X
    3   luzern  chrysler    gravity falcon  zanzini PG
    

  • PS 要保留TAB分隔符,请传递-t选项:

    PS To preserve the TAB separator character, pass the -t option:

     join -t'    ' ...
    

    在SO上很难显示''包含TAB字符.用 ^ V TAB 键入(例如,以bash格式)

    It's kind of hard to show on SO that ' ' contained a TAB character. Type it with ^VTAB (e.g. in bash)

    这篇关于合并/合并两个表快速的Linux命令行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆