将多个FASTA文件拆分为单独的文件,并保留其原始名称 [英] splitting a multiple FASTA file into separate files keeping their original names

查看:867
本文介绍了将多个FASTA文件拆分为单独的文件,并保留其原始名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用早些时候在该论坛上发布的AWK脚本.我正在尝试将包含多个DNA序列的大型FASTA文件拆分为单独的FASTA文件.我需要将每个序列分成其自己的FASTA文件,每个新FASTA文件的名称都必须是原始的大型multifasta文件(>之后的所有字符)中DNA序列的名称.

I am trying to work with an AWK script that was posted earlier on this forum. I am trying to split a large FASTA file containing multiple DNA sequences, into separate FASTA files. I need to separate each sequence into its own FASTA file, and the name of each of the new FASTA files needs to be the name of the DNA sequence from the original, large multifasta file (all the characters after the >).

我尝试了在stackoverflow上找到的此脚本:

I tried this script that I found here at stackoverflow:

awk '/^>chr/ {OUT=substr($0,2) ".fa"}; OUT {print >OUT}' your_input

效果很好,但是DNA序列直接在文件名之后开始-没有空格. DNA序列需要从新的一行开始(常规FASTA格式).

It works well, but the DNA sequence begins directly after the name of the file- with no space. The DNA sequence needs to begin on a new line (regular FASTA format).

对于解决此问题的任何帮助,我将不胜感激. 谢谢!!

I would appreciate any help to solve this. Thank you!!

推荐答案

您的意思是这样的吗?

awk '/^>chr/ {OUT=substr($0,2) ".fa";print " ">OUT}; OUT{print >OUT}' your_input

为每个染色体/序列/事物"创建的新文件的开头是否有空白行?

where the new file that is created for each "chromosome/sequence/thing" gets a blank line at the start?

这篇关于将多个FASTA文件拆分为单独的文件,并保留其原始名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆