Multifasta标头修剪 [英] Multifasta header trimming

查看:97
本文介绍了Multifasta标头修剪的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个multifasta文件,我需要为每个fasta文件删除标题的某些部分.例如:

I have a multifasta file and I need to delete some part of the header for every fasta file. For example:

>Viridibacillus_arenosi_FSL_R5_0213-BK137_RS04360-22-CBS_domain-containing_protein <unknown description>
GCTAATGAAGTTATTGGCCTAGTGACAGAAAGGGATATAAAAAACGCGCTTCCTTCTTCC
CTGCTC------AAA
>Viridibacillus_arvi_DSM16317-AMD00_RS08865-16-acetoin_utilization_protein_AcuB <unknown description>
GCGAATGAAGTTATTGGCCTAGTAACAGAAAGGGATATAAAAAACGCCCTTCCATCTTCC
CTGCTC------AAA

我需要删除标题中的-"之后的部分,即"-BK137_RS04360-22-CBS_domain- contains_protein"和"-AMD00_RS08865-16-acetoin_utilization_protein_AcuB".

I need to delete the part after "-" in the header which is "-BK137_RS04360-22-CBS_domain-containing_protein " and "-AMD00_RS08865-16-acetoin_utilization_protein_AcuB ".

我尝试过

 cut -d '-' -f 1 your_file.fasta > new_file.fasta

 awk '{split($0,a,"-"); if(a[1]) print ">"a[1]; else print; }' my_file.fasta > new_file.fasta

但这是一个对齐文件,它也删除了我序列中的-",这当然是我不想要的.

But this is an alignment file, it removed the "-" in my sequence as well, which of course I don't want.

推荐答案

可以通过以下方式轻松完成此操作:

This is easily done in the following way:

awk -F"-" '/^>/{print $1; next}1' in.fasta > out.fasta

这篇关于Multifasta标头修剪的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆