找到发生的次数并将其添加到模式旁边 [英] find the number of occurences and add it next to the pattern

查看:78
本文介绍了找到发生的次数并将其添加到模式旁边的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在目录中有几个文件,在某些文件中,某些模式会多次出现.例如

I have several files in a directory and in some of them, some patterns occur multiple times. For example

文件"8_list"的内容:

Spiroplasma_taiwanense 
Spiroplasma_diminutum 
Spiroplasma_apis 
Spiroplasma_sabaudiense 
Spiroplasma_taiwanense 
Spiroplasma_diminutum 
Spiroplasma_taiwanense 
EntAcro10
EntAcro10
Spiroplasma_apis 
Spiroplasma_culicicola 
Spiroplasma_sabaudiense 
Spiroplasma_diminutum 
Spiroplasma_sabaudiense 
Spiroplasma_sabaudiense 
Spiroplasma_sabaudiense 
Spiroplasma_apis 
Spiroplasma_culicicola 
Spiroplasma_culicicola 
Spiroplasma_culicicola 
Spiroplasma_culicicola 
Spiroplasma_diminutum 
Spiroplasma_culicicola 
Spiroplasma_culicicola 
EntAcro1

和文件"574_list"的内容

Mesoplasma_florum_l1
Spiroplasma_sabaudiense 
Mesoplasma_florum_w37
EntAcro1

所有文件都有一个列. 我想做的是在每个文件中找到相同的模式,然后在它旁边添加一个数字来描述发生的情况.例如,在文件"8_list"中,如果Spiroplasma_culicicola出现7次,则在第一次出现的旁边,应写Spiroplasma_culicicola_1, 第二次出现Spiroplasma_culicicola_2旁边 第三次出现Spiroplasma_culicicola_3旁边 等等

all files have a single column. What I want to do is within each file find the identical patterns and then add a number next to it describing the occurrence. For example, in file "8_list" if Spiroplasma_culicicola occurs 7 times, then next to the first occurrence, it should write Spiroplasma_culicicola_1, next to the second occurrence Spiroplasma_culicicola_2 next to the third occurrence Spiroplasma_culicicola_3 etc etc

我尝试通过分别查找每种模式来使用sed来实现它

I tried to do it with sed by looking for each pattern individually

sed -z 's/Spiroplasma_culicicola/Spiroplasma_culicicola_2/2'

但是我想知道是否有一种更简便的方法来处理给定目录中的所有文件和所有模式

but I was wondering if there is an easier way in order to do it for all my files and all patterns in a given directory

预先感谢

推荐答案

对于诸如awk这样的精美工具,这是一项很好的任务:

This is a good task for such nice tool as awk:

awk '{gsub(" ", "", $0); a[$0]++; print $0"_"a[$0]}' 8_list

gsub(" ", "", $0);-替换行尾的尾随空格

gsub(" ", "", $0); - replaces trailing space at the end of the line

a[$0]++;-增加将列值作为数组键的每个模式(列值)的出现次数

a[$0]++; - incrementing the number of occurrences of each pattern(column value) treating a column value as an array key

输出:

Spiroplasma_taiwanense_1
Spiroplasma_diminutum_1
Spiroplasma_apis_1
Spiroplasma_sabaudiense_1
Spiroplasma_taiwanense_2
Spiroplasma_diminutum_2
Spiroplasma_taiwanense_3
EntAcro10_1
EntAcro10_2
Spiroplasma_apis_2
Spiroplasma_culicicola_1
Spiroplasma_sabaudiense_2
Spiroplasma_diminutum_3
Spiroplasma_sabaudiense_3
Spiroplasma_sabaudiense_4
Spiroplasma_sabaudiense_5
Spiroplasma_apis_3
Spiroplasma_culicicola_2
Spiroplasma_culicicola_3
Spiroplasma_culicicola_4
Spiroplasma_culicicola_5
Spiroplasma_diminutum_4
Spiroplasma_culicicola_6
Spiroplasma_culicicola_7
EntAcro1_1

这篇关于找到发生的次数并将其添加到模式旁边的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆