输出整条生产线一次一列的每个唯一值(击) [英] Output whole line once for each unique value of a column (Bash)
本文介绍了输出整条生产线一次一列的每个唯一值(击)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这肯定是与 AWK
或其他简单的任务,但它留给我今天早上抓我的头。我有一个格式类似这样的文件:
This must surely be a trivial task with awk
or otherwise, but it's left me scratching my head this morning. I have a file with a format similar to this:
pep> AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK 1 genes ADUm.1999,ADUm.3560
pep> AIQLTGK 8 genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR 5 genes ADUm.367
pep> VSSILEDKTT 9 genes ADUm.1192,ADUm.2731
pep> AIQLTGK 10 genes ADUm.1999,ADUm.3560
pep> VSSILEDKILSR 3 genes ADUm.2146,ADUm.5750
pep> VSSILEDKILSR 2 genes ADUm.2146,ADUm.5750
我想打印在列2肽的每个不同的值线,这意味着上述输入将成为:
I would like to print a line for each distinct value of the peptides in column 2, meaning the above input would become:
pep> AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK 1 genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR 5 genes ADUm.367
pep> VSSILEDKTT 9 genes ADUm.1192,ADUm.2731
pep> VSSILEDKILSR 3 genes ADUm.2146,ADUm.5750
这是我到目前为止已经试过,但显然没有做什么,我需要:
This is what I've tried so far, but clearly neither does what I need:
awk '{print $2}' file | sort | uniq
# Prints only the peptides...
awk '{print $0, "\t", $1}' file |sort | uniq -u -f 4
# Altogether omits peptides which are not unique...
最后一件事,这将需要治疗的肽其他肽作为不同值(如VSSILED和VSSILEDKILSR)的子串。感谢:)
One last thing, It will need to treat peptides which are substrings of other peptides as distinct values (eg VSSILED and VSSILEDKILSR). Thanks :)
推荐答案
使用的一种方法 AWK
:
awk '!array[$2]++' file.txt
结果:
pep> AEYTCVAETK 2 genes ADUm.1024,ADUm.5198,ADUm.750
pep> AIQLTGK 1 genes ADUm.1999,ADUm.3560
pep> KHEPPTEVDIEGR 5 genes ADUm.367
pep> VSSILEDKTT 9 genes ADUm.1192,ADUm.2731
pep> VSSILEDKILSR 3 genes ADUm.2146,ADUm.5750
这篇关于输出整条生产线一次一列的每个唯一值(击)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文