bash，sed，awk删除带有重复ID和较旧日期的文本块 [英] bash, sed, awk remove block of text with a duplicate ID and an older date within the block

查看：96 发布时间：2020/9/15 8:31:34 bash awk sed

本文介绍了bash，sed，awk删除带有重复ID和较旧日期的文本块的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想删除每个具有非唯一ID的块，但日期最新的块除外.

I would like to remove every block with a non-unique ID except for one that has the newest date.

我希望这些例子能说明一切.任何awk和/或sed解决方案将不胜感激！

I hope the examples are speaking for themselves. Any awk and/or sed solution would be appreciated!

原始文件:

<BLOCK>
ID=1000
Text
Text
DATE=20160101
Text
</BLOCK>

<BLOCK>
Text
Text
ID=2000
DATE=20140101
Text
Text
</BLOCK>

<BLOCK>
ID=1000
DATE=20100101
Text
</BLOCK>

<BLOCK>
Text
ID=3000
Text
Text
DATE=20160101
Text
</BLOCK>

<BLOCK>
Text
Text
ID=2000
Text
DATE=20151231
</BLOCK>

结果应如下所示:

<BLOCK>
ID=1000
Text
Text
DATE=20160101
Text
</BLOCK>

<BLOCK>
Text
ID=3000
Text
Text
DATE=20160101
Text
</BLOCK>

<BLOCK>
Text
Text
ID=2000
Text
DATE=20151231
</BLOCK>

谢谢您的帮助！

推荐答案

这将适用于任何系统上的任何awk:

This will work with any awk on any system:

$ cat tst.awk
BEGIN { RS=""; ORS="\n\n" }
{
    id = date = $0
    gsub(/.*\nID=|\n.*/,"",id)
    gsub(/.*\nDATE=|\n.*/,"",date)
}
date > dates[id] {
    dates[id] = date
    recs[id] = $0
}
END {
    for (id in recs) {
        print recs[id]
    }
}

$ awk -f tst.awk file
<BLOCK>
ID=1000
Text
Text
DATE=20160101
Text
</BLOCK>

<BLOCK>
Text
Text
ID=2000
Text
DATE=20151231
</BLOCK>

<BLOCK>
Text
ID=3000
Text
Text
DATE=20160101
Text
</BLOCK>

您没有解释输出顺序应该是什么，并且在您的示例中它并不明显，因此我认为您不在乎，因此上述内容以随机"(实际上是哈希)顺序输出记录.

You don't explain what the output order should be and it's not obvious from your example so I assume you don't care and so the above outputs the records in "random" (actually hash) order.

这篇关于bash，sed，awk删除带有重复ID和较旧日期的文本块的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

bash，sed，awk删除带有重复ID和较旧日期的文本块 [英] bash, sed, awk remove block of text with a duplicate ID and an older date within the block

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

bash，sed，awk删除带有重复ID和较旧日期的文本块 [英] bash, sed, awk remove block of text with a duplicate ID and an older date within the block

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭