在100 -200之间选择碱基,并与标题一起打印 [英] Select bases between 100 -200 and printing them along with header

查看:46
本文介绍了在100 -200之间选择碱基,并与标题一起打印的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个multi fasta文件,我需要从中提取100-200范围内的碱基,包括其相应的标头.我知道'cut -c 100-200'可以做到而无需使用它们相应的标题.有什么方法可以在Perl或bash中实现?

I have a multi fasta file, from which I need to extract the bases ranging 100-200, including their corresponding headers. I know that 'cut -c 100-200' can do it without having their corresponding headers. Is there any way to do this in Perl or bash ??

示例文件:

8YS68_00009_00025GAGTTTGATCCTGGCTCAGAGCGAACGCTGGCGGCAGGCTTAACACATGCAAGTCGAGCGGGCGTAGCAATACGTCAGCGGCAGACGGGTGAGTAACGCGTGGGAACATACCTTTTGGTTCGGAACAACACAGGGAAACTTGTGCTAATACCGGATAAGCTACGGGAAGATT8YS68_00009_00027GAGTTTGATCATGGCTCAGAGCGAACGCTGGCGGCAGGCCTAACC8YS68_00012_00035Google

8YS68_00009_00025 GAGTTTGATCCTGGCTCAGAGCGAACGCTGGCGGCAGGCTTAACACATGCAAGTCGAGCGGGCGTAGCAATACGTCAGCGGCAGACGGGTGAGTAACGCGTGGGAACATACCTTTTGGTTCGGAACAACACAGGGAAACTTGTGCTAATACCGGATAAGCTACGGGAAGATT 8YS68_00009_00027 GAGTTTGATCATGGCTCAGAGCGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCCGTAGCAATACGGAGCGGCAGACGGGTGAGTAACGCGTGGGAACGTACCTTTCGGTTCGGAATAACTCAGGGAAACTTGAGCTAATACCGAATACGTCCGTAAGGAGAAAGATTTATCGCCGAAAGATCGGCCCGCGTAAGATTAGCTAGTTGGTGAGGTAAGGCTCACCAAGCGACGATCGTTAGCTTGTC 8YS68_00012_00035 GAGTTTGATCATGGCTCAGAACGAACGTTGGCGGCGTGGATTAGGCATGCAAGTCGAACGAATCCCATCTGGGTAACTGGGTGGGGGAAGTGGCGAAAGGGGCAGTAATGCGTGGGTAACCTACCTGGGGACCGGGATAGCCTCCTAACGGATGGGTAATACCGGATACGACCTTCGGAGGCATCTCCTGAAGG

所需的输出:序列号------ ATCGATCGATCG -----

Desired output: seq id ------ATCGATCGATCG-----

序列号------ ATCGATCGATCG -----

seq id ------ATCGATCGATCG-----

序列号------ ATCGATCGATCG -----

seq id ------ATCGATCGATCG-----

这意味着,我想准确提取每个序列100-200之间的碱基及其标题.如果序列短于100 bp,请忽略它.

Which means, I want to exactly extract the bases between 100-200 of each sequences, along with their headers. If a sequence is shorter than 100 bp, then ignore it.

推荐答案

在查看了建议并解决了一段时间后,我在Perl中找到了解决方案.这是我编写的在Perl中起作用的重要循环".

After reviewing the suggestions and working for sometime with this problem, I found a solution in Perl. Here is the important "loop" which does the job in Perl, which I wrote.

my $seq  = '';
my $head ;

while (my $seq = <IN>) {
if ($seq =~ m/^>/){
    $head = $seq;
    }
    else{
    my $dna .=$seq;
    my $subseq = substr ($seq, 100, 100);
    my $size = length($subseq);
    if ($size > 99){
        print OUT "$head";
        print OUT "$subseq";
        } 
  }

}

谢谢大家的帮助和支持.

Thank you all for the help and support.

这篇关于在100 -200之间选择碱基,并与标题一起打印的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆