如何将一个文件中的另一列线匹配的输出文件中的所有字符串? [英] How to move all strings in one file that match the lines of another to columns in an output file?

查看:90
本文介绍了如何将一个文件中的另一列线匹配的输出文件中的所有字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个文件,​​每个文件一列看起来像这样的:

文件1

  CHR1 106623434
CHR1 106623436
CHR1 106623442
CHR1 106623468
CHR1 10699400
CHR1 10699405
CHR1 10699408
CHR1 10699415
CHR1 10699426
CHR1 10699448
CHR1 110611528
CHR1 110611550
CHR1 110611552
CHR1 110611554
CHR1 110611560

文件2

  CHR1 1066234
CHR1 106994
CHR1 1106115

我要搜索文件1文件2的每一行,拉出每个都有确切的串线并投入一个新的文件。我想每个搜索的输出是在自己的列或制表符分隔行。我想在文件2的每一行做到这一点希望输出将是这个样子:

  CHR1 106623434 CHR1 10699400 CHR1 110611528
CHR1 106623436 CHR1 10699405 CHR1 110611550
CHR1 106623442 CHR1 10699408 CHR1 110611552
CHR1 106623468 CHR1 10699415 CHR1 110611554
                CHR1 10699426 CHR1 110611560
                CHR1 10699448


解决方案

  $猫tst.awk
NR == FNR {的TGT [++ numTgts] = $ 0;下一个 }
{
    为(tgtNr = 1; tgtNr&下; = numTgts; tgtNr ++){
        TGT =的TGT [tgtNr]
        如果($ 0〜^TGT){
            numHits [tgtNr] ++
            maxHits =(numHits [tgtNr]≥maxHits numHits [tgtNr]:maxHits)
            点击[tgtNr,numHits [tgtNr] = $ 0个
        }
    }
}
结束 {
    为(hitNr = 1; hitNr&下; = maxHits; hitNr ++){
        为(tgtNr = 1; tgtNr&下; = numTgts; tgtNr ++){
             printf的%-16s%S,点击[tgtNr,hitNr](tgtNr< numTgts OFS:ORS)
        }
    }
}$ AWK -f tst.awk文件2文件1
CHR1 106623434 CHR1 10699400 CHR1 110611528
CHR1 106623436 CHR1 10699405 CHR1 110611550
CHR1 106623442 CHR1 10699408 CHR1 110611552
CHR1 106623468 CHR1 10699415 CHR1 110611554
                 CHR1 10699426 CHR1 110611560
                 CHR1 10699448

I have two files, each with one column that look like this:

File 1

chr1 106623434
chr1 106623436
chr1 106623442
chr1 106623468
chr1 10699400
chr1 10699405
chr1 10699408
chr1 10699415
chr1 10699426
chr1 10699448
chr1 110611528
chr1 110611550
chr1 110611552
chr1 110611554
chr1 110611560

File 2

chr1 1066234
chr1 106994
chr1 1106115

I want to search file 1 with each line of file 2 and pull out every line that has the exact string and put into a new file. I want each search output to be in its own column or line separated by tabs. I want to do this for every line in file 2. Hopefully the output will look something like this:

chr1 106623434  chr1 10699400   chr1 110611528
chr1 106623436  chr1 10699405   chr1 110611550
chr1 106623442  chr1 10699408   chr1 110611552
chr1 106623468  chr1 10699415   chr1 110611554
                chr1 10699426   chr1 110611560
                chr1 10699448     

解决方案

$ cat tst.awk
NR==FNR { tgts[++numTgts] = $0; next }
{
    for (tgtNr=1; tgtNr<=numTgts; tgtNr++) {
        tgt = tgts[tgtNr]
        if ($0 ~ "^"tgt) {
            numHits[tgtNr]++
            maxHits = (numHits[tgtNr] > maxHits ? numHits[tgtNr] : maxHits)
            hits[tgtNr,numHits[tgtNr]] = $0
        }
    }
}
END {
    for (hitNr=1; hitNr<=maxHits; hitNr++) {
        for (tgtNr=1; tgtNr<=numTgts; tgtNr++) {
             printf "%-16s%s", hits[tgtNr,hitNr], (tgtNr<numTgts?OFS:ORS)
        }
    }
}

$ awk -f tst.awk file2 file1
chr1 106623434   chr1 10699400    chr1 110611528
chr1 106623436   chr1 10699405    chr1 110611550
chr1 106623442   chr1 10699408    chr1 110611552
chr1 106623468   chr1 10699415    chr1 110611554
                 chr1 10699426    chr1 110611560
                 chr1 10699448

这篇关于如何将一个文件中的另一列线匹配的输出文件中的所有字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆