从文件中删除行的重复块 [英] Removing duplicate blocks of lines from a file

查看:92
本文介绍了从文件中删除行的重复块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个特定的文件结构像这样

I have a certain file structure like this

>ID1
data about ID1....
................
................

>ID2
data about ID2....
................
................
................
................
>ID3
data about ID3....
................
................
...............

>ID1
data about ID1....
................
>ID5
data about ID5....
................
................

我想删除ID的这些重复的块。对于如在上述情况下,它是ID1。应该指出,只有ID部是一样的,之后,该数据可以是不同的。不过,我想保持第一位,并删除所有其他的人。我怎样才能做到这一点在shell脚本的方式?

I want to remove these duplicate blocks of IDs. For eg in the above case it is ID1. It should be noted that only the ID part is same, the data after that could be different. However, I want to keep the first one and remove all the other ones. How can I do this in shell scripting manner?

推荐答案

在awk中

awk '/^>/{p=!($0 in a);a[$0]}p' file1

这篇关于从文件中删除行的重复块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆