从Fasta文件中删除多个序列 [英] Remove multiple sequences from fasta file

查看：695 发布时间：2020/9/15 7:48:08 bash awk sed fasta

本文介绍了从Fasta文件中删除多个序列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个字符序列的文本文件，该文件由两行组成:标题和下一行的序列本身.该文件的结构如下:

I have a text file of character sequences that consist of two lines: a header, and the sequence itself in the following line. The structure of the file is as follow:

>header1
aaaaaaaaa
>header2
bbbbbbbbbbb
>header3
aaabbbaaaa
[...]
>headerN
aaabbaabaa

在另一个文件中，我有一个要删除的序列标头的列表，如下所示:

In an other file I have a list of headers of sequences that I would like to remove, like this:

>header1
>header5
>header12
[...]
>header145

这个想法是从第一个文件中删除这些序列，因此所有这些标头+下一行.我是使用sed进行的，如下所示，

The idea is to remove these sequences from the first file, so all these headers+the following line. I did it using sed like the following,

while read line; do sed -i "/$line/,+1d" first_file.txt; done < second_file.txt

它可以工作，但是要花很长时间，因为我用sed多次加载了整个文件，而且它很大.关于如何加快此过程的任何想法吗?

It works but takes quite long since I am loading the whole file several times with sed, and it is quite big. Any idea on how I could speed up this process?

推荐答案

使用第二个文件中的删除命令创建脚本:

Create a script with the delete commands from the second file:

sed 's#\(.*\)#/\1/,+1d#' secondFile.txt > commands.sed

然后将该文件应用于第一个

Then apply that file to the first

sed -f commands.sed firstFile.txt

这篇关于从Fasta文件中删除多个序列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从Fasta文件中删除多个序列 [英] Remove multiple sequences from fasta file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从Fasta文件中删除多个序列 [英] Remove multiple sequences from fasta file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭