删除含有AWK独特的第一场线? [英] Removing lines containing a unique first field with awk?

查看:116
本文介绍了删除含有AWK独特的第一场线?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

展望只打印有一个重复第一场线。例如从数据中,看起来像这样:

  1 ABCD
1 EFGH
2 IJKL
3 MNOP
4 QRST
4 uvwx

应该打印出来:

  1 ABCD
1 EFGH
4 QRST
4 uvwx

(仅供参考 - 第一场并不总是1个字符长在我的数据)


解决方案

 的awk'FNR == NR {A [$ 1] ++;旁}(一[$ 1]→1 )./infile ./infile

是的,你给它同一个文件作为输入两次。因为你不知道未来的时间,如果当前记录是uniq的或没有,你建立在第一次有那么你只输出记录基于 $ 1 数组看到 $ 1 不止一次在第二遍。

我敢肯定有办法只有一个单一的闯关文件来做到这一点,但我怀疑他们会像干净

说明


  1. FNR == NR :这时候 AWK 正在读取第一个文件是唯一的真实。它本质上看出测试(NR)与当前文件(FNR)。输入记录记录总数

  2. A [$ 1] ++ :建立一个关联数组 谁的关键是第一个字段( $ 1 ),谁的价值是由一个每次看到的时候递增。

  3. 接下来:忽略脚本的其余部分如果达到这一点,一个新的输入记录重新开始

  4. (A [$ 1]→1)这将只的第二次评估 ./ INFILE 而且只打印记录谁是第一个字段( $ 1 ),我们已经看到了不止一次。从本质上讲,它是简写如果(A [$ 1]→1){$打印0}

概念验证

  $猫./infile
1 ABCD
1 EFGH
2 IJKL
3 MNOP
4 QRST
4 uvwx$ AWK'FNR == {NR一个[$ 1] ++;}旁(A [$ 1]→1)./infile ./infile
1 ABCD
1 EFGH
4 QRST
4 uvwx

Looking to print only lines that have a duplicate first field. e.g. from data that looks like this:

1 abcd
1 efgh
2 ijkl
3 mnop
4 qrst
4 uvwx

Should print out:

1 abcd
1 efgh
4 qrst
4 uvwx

(FYI - first field is not always 1 character long in my data)

解决方案

awk 'FNR==NR{a[$1]++;next}(a[$1] > 1)' ./infile ./infile

Yes, you give it the same file as input twice. Since you don't know ahead of time if the current record is uniq or not, you build up an array based on $1 on the first pass then you only output records that have seen $1 more than once on the second pass.

I'm sure there are ways to do it with only a single pass through the file but I doubt they will be as "clean"

Explanation

  1. FNR==NR: This is only true when awk is reading the first file. It essentially tests total number of records seen (NR) vs the input record in the current file (FNR).
  2. a[$1]++: Build an associative array a who's key is the first field ($1) and who's value is incremented by one each time it's seen.
  3. next: Ignore the rest of the script if this is reached, start over with a new input record
  4. (a[$1] > 1) This will only be evaluated on the second pass of ./infile and it only prints records who's first field ($1) we've seen more than once. Essentially, it is shorthand for if(a[$1] > 1){print $0}

Proof of Concept

$ cat ./infile
1 abcd
1 efgh
2 ijkl
3 mnop
4 qrst
4 uvwx

$ awk 'FNR==NR{a[$1]++;next}(a[$1] > 1)' ./infile ./infile
1 abcd
1 efgh
4 qrst
4 uvwx

这篇关于删除含有AWK独特的第一场线?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆