用于匹配具有匹配第一个字段(sed、awk 等)的行的命令行 [英] Command line to match lines with matching first field (sed, awk, etc.)
问题描述
将文本文件中的行与匹配的第一个字段进行匹配的快速而简洁的方法是什么.
What is fast and succinct way to match lines from a text file with a matching first field.
样本输入:
a|lorem
b|ipsum
b|dolor
c|sit
d|amet
d|consectetur
e|adipisicing
e|elit
所需的输出:
b|ipsum
b|dolor
d|amet
d|consectetur
e|adipisicing
e|elit
所需的输出,替代:
b|ipsum|dolor
d|amet|consectetur
e|adipisicing|elit
我可以想象很多方法来编写它,但我怀疑有一种聪明的方法可以做到这一点,例如,使用 sed、awk 等.我的源文件大约为 0.5 GB.
I can imagine many ways to write this, but I suspect there's a smart way to do it, e.g., with sed, awk, etc. My source file is approx 0.5 GB.
这里有一些相关的问题,例如,awk| 在字段匹配的基础上合并行",但其他问题将太多内容加载到内存中.我需要一种流媒体方法.
There are some related questions here, e.g., "awk | merge line on the basis of field matching", but that other question loads too much content into memory. I need a streaming method.
推荐答案
这里有一个方法,你只需要记住上一行(因此需要对输入文件进行排序)
Here's a method where you only have to remember the previous line (therefore requires the input file to be sorted)
awk -F | '
$1 == prev_key {print prev_line; matches ++}
$1 != prev_key {
if (matches) print prev_line
matches = 0
prev_key = $1
}
{prev_line = $0}
END { if (matches) print $0 }
' filename
b|ipsum
b|dolor
d|amet
d|consectetur
e|adipisicing
e|elit
备用输出
awk -F | '
$1 == prev_key {
if (matches == 0) printf "%s", $1
printf "%s%s", FS, prev_value
matches ++
}
$1 != prev_key {
if (matches) printf "%s%s
", FS, prev_value
matches = 0
prev_key = $1
}
{prev_value = $2}
END {if (matches) printf "%s%s
", FS, $2}
' filename
b|ipsum|dolor
d|amet|consectetur
e|adipisicing|elit
这篇关于用于匹配具有匹配第一个字段(sed、awk 等)的行的命令行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!