用于匹配具有匹配第一个字段(sed、awk 等)的行的命令行 [英] Command line to match lines with matching first field (sed, awk, etc.)

查看:19
本文介绍了用于匹配具有匹配第一个字段(sed、awk 等)的行的命令行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将文本文件中的行与匹配的第一个字段进行匹配的快速而简洁的方法是什么.

What is fast and succinct way to match lines from a text file with a matching first field.

样本输入:

a|lorem
b|ipsum
b|dolor
c|sit
d|amet
d|consectetur
e|adipisicing
e|elit

所需的输出:

b|ipsum
b|dolor
d|amet
d|consectetur
e|adipisicing
e|elit

所需的输出,替代:

b|ipsum|dolor
d|amet|consectetur
e|adipisicing|elit

我可以想象很多方法来编写它,但我怀疑有一种聪明的方法可以做到这一点,例如,使用 sed、awk 等.我的源文件大约为 0.5 GB.

I can imagine many ways to write this, but I suspect there's a smart way to do it, e.g., with sed, awk, etc. My source file is approx 0.5 GB.

这里有一些相关的问题,例如,awk| 在字段匹配的基础上合并行",但其他问题将太多内容加载到内存中.我需要一种流媒体方法.

There are some related questions here, e.g., "awk | merge line on the basis of field matching", but that other question loads too much content into memory. I need a streaming method.

推荐答案

这里有一个方法,你只需要记住上一行(因此需要对输入文件进行排序)

Here's a method where you only have to remember the previous line (therefore requires the input file to be sorted)

awk -F | '
    $1 == prev_key {print prev_line; matches ++}
    $1 != prev_key {                            
        if (matches) print prev_line
        matches = 0
        prev_key = $1
    }                
    {prev_line = $0}
    END { if (matches) print $0 }
' filename

b|ipsum
b|dolor
d|amet
d|consectetur
e|adipisicing
e|elit

备用输出

awk -F | '
    $1 == prev_key {
        if (matches == 0) printf "%s", $1 
        printf "%s%s", FS, prev_value
        matches ++
    }             
    $1 != prev_key {
        if (matches) printf "%s%s
", FS, prev_value
        matches = 0                                 
        prev_key = $1
    }                
    {prev_value = $2}
    END {if (matches) printf "%s%s
", FS, $2}
' filename

b|ipsum|dolor
d|amet|consectetur
e|adipisicing|elit

这篇关于用于匹配具有匹配第一个字段(sed、awk 等)的行的命令行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆