命令行与第一场(SED,AWK等)配套匹配线 [英] Command line to match lines with matching first field (sed, awk, etc.)
问题描述
什么是快速而简洁的方式来与匹配的第一场相匹配的文本文件中的行。
What is fast and succinct way to match lines from a text file with a matching first field.
样品输入:
a|lorem
b|ipsum
b|dolor
c|sit
d|amet
d|consectetur
e|adipisicing
e|elit
所需的输出:
b|ipsum
b|dolor
d|amet
d|consectetur
e|adipisicing
e|elit
所需的输出,替代方案:
Desired output, alternative:
b|ipsum|dolor
d|amet|consectetur
e|adipisicing|elit
我可以想像很多方法来写这篇文章,但我怀疑有一个聪明的办法做到这一点,例如,使用sed,awk中,等等。我的源文件大约0.5 GB。
I can imagine many ways to write this, but I suspect there's a smart way to do it, e.g., with sed, awk, etc. My source file is approx 0.5 GB.
有一些相关的问题在这里,如 AWK |场匹配的的基础上合并线,但其它问题负荷太多的内容到存储器中。我需要一个流的方法。
There are some related questions here, e.g., "awk | merge line on the basis of field matching", but that other question loads too much content into memory. I need a streaming method.
推荐答案
下面是一个方法,你只需要记住previous线(因此需要进行排序输入文件)
Here's a method where you only have to remember the previous line (therefore requires the input file to be sorted)
awk -F \| '
$1 == prev_key {print prev_line; matches ++}
$1 != prev_key {
if (matches) print prev_line
matches = 0
prev_key = $1
}
{prev_line = $0}
END { if (matches) print $0 }
' filename
b|ipsum
b|dolor
d|amet
d|consectetur
e|adipisicing
e|elit
备用输出
awk -F \| '
$1 == prev_key {
if (matches == 0) printf "%s", $1
printf "%s%s", FS, prev_value
matches ++
}
$1 != prev_key {
if (matches) printf "%s%s\n", FS, prev_value
matches = 0
prev_key = $1
}
{prev_value = $2}
END {if (matches) printf "%s%s\n", FS, $2}
' filename
b|ipsum|dolor
d|amet|consectetur
e|adipisicing|elit
这篇关于命令行与第一场(SED,AWK等)配套匹配线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!