比较File中的column1与File2中的column1,输出文件2中不存在的{Column1 File1} [英] Compare column1 in File with column1 in File2, output {Column1 File1} that does not exist in file 2

查看:58
本文介绍了比较File中的column1与File2中的column1,输出文件2中不存在的{Column1 File1}的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是我的文件1内容:

Below is my file 1 content:

123|yid|def|
456|kks|jkl|
789|mno|vsasd|

这是我的文件2的内容

123|abc|def|
456|ghi|jkl|
789|mno|pqr|
134|rst|uvw|

我想在基于文件2的文件1中比较的唯一内容是第1列.基于以上文件,输出应仅输出:

The only thing I want to compare in File 1 based on File 2 is column 1. Based on the files above, the output should only output:

134|rst|uvw|

行到行比较不是答案,因为第2列和第3列都包含不同的内容,但是在两个文件中只有第1列包含的内容完全相同.

Line to Line comparisons are not the answer since both column 2 and 3 contains different things but only column 1 contains the exact same thing in both files.

我该如何实现?

当前,我在我的代码中使用了它:

Currently I'm using this in my code:

#sort FILEs first before comparing

sort $FILE_1 > $FILE_1_sorted
sort $FILE_2 > $FILE_2_sorted

for oid in $(cat $FILE_1_sorted |awk -F"|" '{print $1}');
do
echo "output oid $oid"

#for every oid in FILE 1, compare it with oid FILE 2 and output the difference

grep -v diff "^${oid}|" $FILE_1 $FILE_2 | grep \< | cut -d \  -f 2 > $FILE_1_tmp

推荐答案

您可以在 Awk 中轻松完成此操作!

You can do this in Awk very easily!

awk 'BEGIN{FS=OFS="|"}FNR==NR{unique[$1]; next}!($1 in unique)' file1 file2

Awk 通过一次处理输入行一个来工作.另外, Awk 提供了一些特殊的子句, BEGIN {} END {} 包含了在处理代码之前和之后要执行的操作.文件.

Awk works by processing input lines one at a time. And there are special clauses which Awk provides, BEGIN{} and END{} which encloses actions to be run before and after the processing of the file.

因此,在处理文件之前设置了 BEGIN {FS = OFS ="|"} 部分,并且 FS OFS Awk 中的特殊变量代表输入和输出字段分隔符.由于您提供的文件是由 | 分隔的,因此您需要通过设置 FS ="|" 来解析该文件,并用 |打印回来.,因此设置 OFS ="|"

So the part BEGIN{FS=OFS="|"} is set before the file processing happens, and FS and OFS are special variables in Awk which stand for input and output field separators. Since you have a provided a file that is de-limited by | you need to parse it by setting FS="|" also to print it back with |, so set OFS="|"

命令的主要部分在 BEGIN 子句之后,部分 FNR == NR 用于处理命令中提供的第一个文件参数,因为 FNR 跟踪组合文件的行号,而 NR 仅跟踪当前文件的行号.因此,对于第一个文件中的每个 $ 1 ,值将散列到称为 unique 的数组中,然后在发生 next 文件处理时,该部分!($ 1是唯一的)将把那些行中 $ 1 值不属于哈希数组的那些行删除.

The main part of the command comes after BEGIN clause, the part FNR==NR is meant to process the first file argument provided in the command, because FNR keeps track of the line numbers for the both the files combined and NR for only the current file. So for each $1 in the first file, the values are hashed into the array called unique and then when the next file processing happens, the part !($1 in unique) will drop those lines in second file whose $1 value is not int the hashed array.

这篇关于比较File中的column1与File2中的column1,输出文件2中不存在的{Column1 File1}的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆