perl在子例程中循环以显示为该字符串的特定子部分选择的最长重复字符串 [英] perl loops within subroutines to display the longest repeating string thats selected for a particular subsection of the string

查看:115
本文介绍了perl在子例程中循环以显示为该字符串的特定子部分选择的最长重复字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否有人知道如何简化或概括该代码.它给出了正确的答案,但是仅适用于当前情况.我的代码如下:

I was wondering if anyone knows how to simplify, or generalize this code. It gives the correct answer, however it is only applicable to the current situation. My code is as follows:

sub longestRepeat{
                                # list of argument @_ is: (sequence, nucleotide)
  my $someSequence = shift(@_);  # shift off the first  argument from the list
  my $whatBP       = shift(@_);  # shift off the second argument from the list
  my $match = 0;



        if ($whatBP eq "AT"){
            if ($someSequence =~ m/(([A][T])\2\2\2\2\2)/g) {

            $match = $1
            }
            return $match;

        }
        if ($whatBP eq "TAGA"){
            if ($someSequence =~ m/(([T][A][G][A])\2\2)/g) {

            $match = $1
            }
            return $match;
        }

        if ($whatBP eq "C"){
            if ($someSequence =~ m/(([C])\2\2)/g) {

            $match = $1
            }
            return $match;
        }
}   

我的问题是,在第二个if语句中,我将其设置为一定数量的重复模式(适用于我们得到的字符串).但是,是否有办法继续执行while循环来搜索\ 2(模式重复)?我的意思是:如果($ someSequence =〜m/(([[A] [T])\ 2 \ 2 \ 2 \ 2 \ 2 \ 2)/g)被简化并用while循环概括

My question is, in the second if statement, I have it set to a set amount of that pattern being repeated (applicable for the string we were given). However, is there a way to keep doing a while loop to search through the \2 (pattern repeat)? What I mean is can this: if ($someSequence =~ m/(([A][T])\2\2\2\2\2)/g) be simplified and generalized with a while loop

推荐答案

基于您的子例程的名称,我假设您要查找序列中最长的重复序列.

Based on the name of your subroutine, I'm assuming that you want to find the longest repeat sequence in your sequence.

如果是,请执行以下操作:

If so, how about the following:

sub longest_repeat {

    my ( $sequence, $what ) = @_;

    my @matches = $sequence =~ /((?:$what)+)/g ;  # Store all matches

    my $longest;
    foreach my $match ( @matches ) {  # Could also avoid temp variable :
                                      # for my $match ( $sequence =~ /((?:$what)+)/g )

        $longest //= $match ;         # Initialize
                                      #  (could also do `$longest = $match
                                      #                    unless defined $match`)

        $longest = $match if length( $longest ) < length( $match );
    }

    return $longest;  # Note this also handles the case of no matches
}

如果可以理解的话,以下版本可以通过Schwartzian转换实现基本相同的功能:

If you can digest that, the following version achieves essentially the same functionality with a Schwartzian transform:

sub longest_repeat {

    my ( $sequence, $what ) = @_;                          # Example:
                                                           # --------------------
    my ( $longest ) = map { $_->[0] }                      # 'ATAT' ...
                        sort { $b->[1] <=> $a->[1] }       # ['ATAT',4], ['AT',2]
                          map { [ $_, length($_) ] }       # ['AT',2], ['ATAT',4]
                            $sequence =~ /((?:$what)+)/g ; # ... 'AT', 'ATAT'

    return $longest ;
}

有些人可能认为sort是浪费的,因为它是O(n.log(n))而不是O(n),但是ya种类繁多.

Some may argue that it is wasteful to sort because it is O(n.log(n)) instead of O(n) but there's variety for ya.

这篇关于perl在子例程中循环以显示为该字符串的特定子部分选择的最长重复字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆