如何查找文件中是否存在行并使用awk添加带有文件名的列? [英] How can I find if a row exists in a file and add a column with the filename using awk?
问题描述
我正在尝试查找文件中的行在另一个文件中是否已经存在,在这种情况下,请添加带有文件名的列.
I'm trying to find if a row in a file already exists in another file, and, in that case, add a column with the filename.
文件1:
CHROM POS REF ALT
chr1 10 T A
chr1 12 T G
chr1 12 T C
文件2:
CHROM POS REF ALT
chr1 12 T C
chr1 13 A T
我要检查file2中是否有任何行.
I want to check if any row in file2 is in file1.
预期输出:
Expected output:
CHROM POS REF ALT
chr1 10 T A
chr1 12 T G
chr1 12 T C file2
我尝试使用以下代码:
`awk -F"\t" 'FNR==NR
{
seen[$0];next
}($0 in seen)
{
delete seen[$0]
};
END{
for (x in seen);$(NF+1)="file";print
}
{print}' OFS="\t" file2 file1`
但这不能按预期工作.这就是我得到的:
But this is not working as expected. This is what I'm getting:
CHROM POS REF ALT
chr1 10 T A
chr1 12 T G
chr1 12 T C
chr1 12 T C file2
如何删除重复的行?谢谢!
How could I delete the duplicated row? Thanks!
推荐答案
能否请您尝试以下操作.
Could you please try following.
awk '
FNR==1 && FNR==NR{
print
next
}
FNR==NR{
a[$0]=FILENAME
next
}
FNR>1{
print $0,$0 in a?OFS a[$0]:""
}' file2 file1
输出如下.
CHROM POS REF ALT
chr1 10 T A
chr1 12 T G
chr1 12 T C file2
注意:如果Input_files是TAB分隔的,我们也需要以TAB分隔的形式输出,然后在 awk
之后添加 BEGIN
部分,例如 awk'BEGIN {FS= OFS ="\ t"} ....
NOTE: In case Input_files are TAB delimited and we need output in TAB delimited form too then add a BEGIN
section after awk
like awk 'BEGIN{FS=OFS="\t"}....
这篇关于如何查找文件中是否存在行并使用awk添加带有文件名的列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!