如何将一个文件中的另一列线匹配的输出文件中的所有字符串? [英] How to move all strings in one file that match the lines of another to columns in an output file?
本文介绍了如何将一个文件中的另一列线匹配的输出文件中的所有字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有两个文件,每个文件一列看起来像这样的:
文件1
CHR1 106623434
CHR1 106623436
CHR1 106623442
CHR1 106623468
CHR1 10699400
CHR1 10699405
CHR1 10699408
CHR1 10699415
CHR1 10699426
CHR1 10699448
CHR1 110611528
CHR1 110611550
CHR1 110611552
CHR1 110611554
CHR1 110611560
文件2
CHR1 1066234
CHR1 106994
CHR1 1106115
我要搜索文件1文件2的每一行,拉出每个都有确切的串线并投入一个新的文件。我想每个搜索的输出是在自己的列或制表符分隔行。我想在文件2的每一行做到这一点希望输出将是这个样子:
CHR1 106623434 CHR1 10699400 CHR1 110611528
CHR1 106623436 CHR1 10699405 CHR1 110611550
CHR1 106623442 CHR1 10699408 CHR1 110611552
CHR1 106623468 CHR1 10699415 CHR1 110611554
CHR1 10699426 CHR1 110611560
CHR1 10699448
解决方案
$猫tst.awk
NR == FNR {的TGT [++ numTgts] = $ 0;下一个 }
{
为(tgtNr = 1; tgtNr&下; = numTgts; tgtNr ++){
TGT =的TGT [tgtNr]
如果($ 0〜^TGT){
numHits [tgtNr] ++
maxHits =(numHits [tgtNr]≥maxHits numHits [tgtNr]:maxHits)
点击[tgtNr,numHits [tgtNr] = $ 0个
}
}
}
结束 {
为(hitNr = 1; hitNr&下; = maxHits; hitNr ++){
为(tgtNr = 1; tgtNr&下; = numTgts; tgtNr ++){
printf的%-16s%S,点击[tgtNr,hitNr](tgtNr< numTgts OFS:ORS)
}
}
}$ AWK -f tst.awk文件2文件1
CHR1 106623434 CHR1 10699400 CHR1 110611528
CHR1 106623436 CHR1 10699405 CHR1 110611550
CHR1 106623442 CHR1 10699408 CHR1 110611552
CHR1 106623468 CHR1 10699415 CHR1 110611554
CHR1 10699426 CHR1 110611560
CHR1 10699448
I have two files, each with one column that look like this:
File 1
chr1 106623434
chr1 106623436
chr1 106623442
chr1 106623468
chr1 10699400
chr1 10699405
chr1 10699408
chr1 10699415
chr1 10699426
chr1 10699448
chr1 110611528
chr1 110611550
chr1 110611552
chr1 110611554
chr1 110611560
File 2
chr1 1066234
chr1 106994
chr1 1106115
I want to search file 1 with each line of file 2 and pull out every line that has the exact string and put into a new file. I want each search output to be in its own column or line separated by tabs. I want to do this for every line in file 2. Hopefully the output will look something like this:
chr1 106623434 chr1 10699400 chr1 110611528
chr1 106623436 chr1 10699405 chr1 110611550
chr1 106623442 chr1 10699408 chr1 110611552
chr1 106623468 chr1 10699415 chr1 110611554
chr1 10699426 chr1 110611560
chr1 10699448
解决方案
$ cat tst.awk
NR==FNR { tgts[++numTgts] = $0; next }
{
for (tgtNr=1; tgtNr<=numTgts; tgtNr++) {
tgt = tgts[tgtNr]
if ($0 ~ "^"tgt) {
numHits[tgtNr]++
maxHits = (numHits[tgtNr] > maxHits ? numHits[tgtNr] : maxHits)
hits[tgtNr,numHits[tgtNr]] = $0
}
}
}
END {
for (hitNr=1; hitNr<=maxHits; hitNr++) {
for (tgtNr=1; tgtNr<=numTgts; tgtNr++) {
printf "%-16s%s", hits[tgtNr,hitNr], (tgtNr<numTgts?OFS:ORS)
}
}
}
$ awk -f tst.awk file2 file1
chr1 106623434 chr1 10699400 chr1 110611528
chr1 106623436 chr1 10699405 chr1 110611550
chr1 106623442 chr1 10699408 chr1 110611552
chr1 106623468 chr1 10699415 chr1 110611554
chr1 10699426 chr1 110611560
chr1 10699448
这篇关于如何将一个文件中的另一列线匹配的输出文件中的所有字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文