编写一个Perl脚本,该脚本包含一个Fasta并反转所有序列(没有BioPerl)? [英] Write a Perl script that takes in a fasta and reverses all the sequences (without BioPerl)?

查看:157
本文介绍了编写一个Perl脚本,该脚本包含一个Fasta并反转所有序列(没有BioPerl)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道这只是Stawberry Perl的一个怪癖,但我似乎无法使其运行.我只需要吃一块法式面包,并颠倒其中的每个顺序.

I dont know if this is just a quirk with Stawberry Perl, but I can't seem to get it to run. I just need to take a fasta and reverse every sequence in it.

-问题-

我有一个multifasta文件:

I have a multifasta file:

>seq1
ABCDEFG
>seq2
HIJKLMN

,预期的输出是:

>REVseq1
GFEDCBA
>REVseq2
NMLKJIH

脚本在这里:

$NUM_COL = 80; ## set the column width of output file
$infile = shift; ## grab input sequence file name from command line
$outfile = "test1.txt"; ## name output file, prepend with "REV"
open (my $IN, $infile);
open (my $OUT, '>', $outfile);
$/ = undef; ## allow entire input sequence file to be read into memory
my $text = <$IN>; ## read input sequence file into memory
print $text; ## output sequence file into new decoy sequence file
my @proteins = split (/>/, $text); ## put all input sequences into an array


for my $protein (@proteins) { ## evaluate each input sequence individually
    $protein =~ s/(^.*)\n//m; ## match and remove the first descriptive line of
    ## the FATA-formatted protein
    my $name = $1; ## remember the name of the input sequence
    print $OUT ">REV$name\n"; ## prepend with #REV#; a # will help make the
    ## protein stand out in a list
    $protein =~ s/\n//gm; ## remove newline characters from sequence
    $protein = reverse($protein); ## reverse the sequence

    while (length ($protein) > $NUM_C0L) { ## loop to print sequence with set number of cols

    $protein =~ s/(.{$NUM_C0L})//;
    my $line = $1;
    print $OUT "$line\n";
    }
    print $OUT "$protein\n"; ## print last portion of reversed protein
}

close ($IN);
close ($OUT);
print "done\n";

推荐答案

这将按照您的要求

它从FASTA文件中构建一个哈希%fasta,保留数组@keys以保持序列顺序,然后打印出哈希的每个元素

It builds a hash %fasta out of the FASTA file, keeping array @keys to keep the sequences in order, and then prints out each element of the hash

使用reverse反转序列的每一行,然后再将其添加到哈希中,然后使用unshift反转序列的各行.

Each line of the sequence is reversed using reverse before it is added to the hash, and using unshift adds the lines of the sequence in reverse order

程序希望在命令行上将输入文件作为参数,并将结果打印到STDOUT,可以在命令行上将其重定向

The program expects the input file as a parameter on the command line, and prints the result to STDOUT, which may be redirected on the command line

use strict;
use warnings 'all';

my (%fasta, @keys);

{
    my $key;

    while ( <> ) {

        chomp;

        if ( s/^>\K/REV/ ) {
            $key = $_;
            push @keys, $key;
        }
        elsif ( $key ) {
            unshift @{ $fasta{$key} }, scalar reverse;
        }
    }
}

for my $key ( @keys ) {
    print $key, "\n";
    print "$_\n" for @{ $fasta{$key} };
}

输出

>REVseq1
GFEDCBA
>REVseq2
NMLKJIH


如果您希望重新包装序列以使短行位于末尾,则只需重写转储哈希的代码

If you prefer to rewrap the sequence so that short lines are at the end, then you just need to rewrite the code that dumps the hash

此替代方法使用原始文件中最长的行的长度作为限制,然后将颠倒的序列重新包装为相同的长度.显而易见,指定一个明确的长度而不是计算长度很简单

This alternative uses the length of the longest line in the original file as the limit, and rerwraps the reversed sequence to the same length. It's claer that it would be simple to specify an explicit length instead of calculating it

您需要在程序顶部添加use List::Util 'max'

You will need to add use List::Util 'max' at the top of the program

my $len = max map length, map @$_, values %fasta;

for my $key ( @keys ) {
    print $key, "\n";
    my $seq = join '', @{ $fasta{$key} };
    print "$_\n" for $seq =~ /.{1,$len}/g;
}

给出原始数据,其输出与上述解决方案的输出相同.我用它作为输入

Given the original data the output is identical to that of the solution above. I used this as input

>seq1
ABCDEFGHI
JKLMNOPQRST
UVWXYZ
>seq2
HIJKLMN
OPQRSTU
VWXY

具有此结果.所有行都被换成11个字符-原始数据中最长的JKLMNOPQRST行的长度

with this result. All lines have been wrapped to eleven characters - the length of the longest JKLMNOPQRST line in the original data

>REVseq1
ZYXWVUTSRQP
ONMLKJIHGFE
DCBA
>REVseq2
YXWVUTSRQPO
NMLKJIH

这篇关于编写一个Perl脚本,该脚本包含一个Fasta并反转所有序列(没有BioPerl)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆