如何有条件地过滤awk中的行 [英] how to conditionally filter rows in awk
问题描述
我是刚接触Linux的awk的人.我有一个包含1700万行的大文本文件.第一列是主题 ID
,第二列是 Age
.每个主题可能有多个年龄,我只想过滤每个主题的最小年龄并将其打印在单独的文本文件中.我不确定这些科目是否从低到高排在第一列...这些是前几行:
I am new to awk in linux. I have a large text file with 17 Million rows. The first column is subject ID
and the second column is Age
. Each subject may have multiple ages and I just want to filter the minimum age for each subject and print them in a separate text file. I am not sure if the subjects are ranked in first column from low to high... these are the first few rows:
ID Age
16214497 36.000
16214497 63.000
16214727 63.000
16214781 71.000
16214781 79.000
16214792 67.000
16214860 79.000
16214862 62.000
16214874 61.000
推荐答案
尝试(只是awk,没有管道,使用内存保留值):
Try (just awk with no pipes, using memory to retain values) :
$ awk '
NR=1{print; next} # ¹
arr[$1]==0 {arr[$1]=$2} # ²
($2 < arr[$1]) {arr[$1]=$2} # ³
END{for (i in arr) {print i, arr[i]}} # ⁴
' file
真实的命令行:
(如果多行让您感到恐惧)
The real command line :
(if multi-lines makes you fear)
awk 'NR=1{print; next} arr[$1]==0 {arr[$1]=$2} ($2 < arr[$1]) {arr[$1]=$2} END{for (i in arr) {print i, arr[i]}}' x.txt
(但也适用于换行符和注释,up2u)
(but works too with newlines and comments, up2u)
- ¹打印,然后跳过第一行
- ²如果arr [key]的值为null,那么我们向arr [key]提供第二列,并动态创建数组 (第一列为键).
- ³如果第二列小于arr [key],则将第二列的新值分配给arr [key]
- ⁴@处理完所有行之后,我们将打印数组的键和值
ID Age
16214497 36.000
16214727 63.000
16214781 71.000
16214792 67.000
16214860 79.000
16214862 62.000
16214874 61.000
这篇关于如何有条件地过滤awk中的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!