根据 perl 中的输入查找最长的重复字符串(使用子程序) [英] Find longest repeating string based on input in perl (using subroutines)

查看:96
本文介绍了根据 perl 中的输入查找最长的重复字符串(使用子程序)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我试图找到给定的特定模式的最长重复.到目前为止,我的代码看起来像这样,并且相当接近,但是它并没有完全给出想要的结果:

So I'm trying to find the longest repeat for a specific pattern thats given. My code so far looks like this, and is fairly close, however it does not fully give the wanted result:

use warnings;
use strict;    

my $DNA;       
$DNA = "ATATCCCACTGTAGATAGATAGAATATATATATATCCCAGCT" ;
print "$DNA\n" ;
print "The longest AT repeat is " . longestRepeat($DNA, "AT") . "\n" ;
print "The longest TAGA repeat is " . longestRepeat($DNA, "TAGA") . "\n" ;
print "The longest C repeat is " . longestRepeat($DNA, "C") . "\n" ;

sub longestRepeat{

  my $someSequence = shift(@_);  # shift off the first  argument from the list
  my $whatBP       = shift(@_);  # shift off the second argument from the list
  my $match = 0;



        if ($whatBP eq "AT"){
            while ($someSequence =~ m/$whatBP/g) {
            $match = $match + 1;
            }
            return $match;

        }
        if ($whatBP eq "TAGA"){
            while ($someSequence =~ m/$whatBP/g) {
            $match = $match + 1;
            }
            return $match;
        }

        if ($whatBP eq "C"){
            while ($someSequence =~ m/$whatBP/g) {
            $match = $match + 1;
            }
            return $match;
        }
}   

它现在所做的只是找出序列中的 TOTAL AT、TAGA、C 的数量.它不是只给我最长的长度,而是将它们相加并给我总数.我认为 while 循环中有问题,但是我不确定.任何帮助将不胜感激.

What its doing right now is just finding the amount of TOTAL AT's, TAGA's, C's in the sequence. Instead of only giving me the length of the longest one, it sums them up and gives me the total. I think there is something wrong in the while loop, however I am unsure of that. Any help would be quite appreciated.

附言它还应该以字符串形式显示最长的重复,而不是数字形式(可能在这里使用 substr).

p.s. It also should display the longest repeat in string form, not number form (probably use of substr here).

推荐答案

您的 longestRepeat 函数无需检查它正在处​​理的三种情况中的哪一种——通常,当您找到您时多次编写完全相同的指令,这暗示您可以排除重复并由此简化程序.考虑以下内容,我已针对功能对其进行了清理并进行了评论以用于说明目的:

There's no need for your longestRepeat function to check which of the three cases it's handling -- in general, when you find you've written exactly the same instructions multiple times, it's a hint that you can factor out the repetition and simplify your program thereby. Consider the following, which I've cleaned up for functionality and commented for illustrative purposes:

#!/usr/bin/env perl
use warnings;
use strict;    

# no need to declare and define separately; this works fine
# also no need for space before semicolon
my $DNA = "ATATCCCACTGTAGATAGATAGAATATATATATATCCCAGCT";
print "$DNA\n";
print "The longest AT repeat is " . longestRepeat($DNA, "AT") . "\n";
print "The longest TAGA repeat is " . longestRepeat($DNA, "TAGA") . "\n";
print "The longest C repeat is " . longestRepeat($DNA, "C") . "\n";

sub longestRepeat {

  # note that, within a function, @_ is the default argument to shift();
  # hence its absence in the next two lines. (in practice, you're more 
  # likely to see 'shift' in this context without even parentheses, much
  # less the full 'shift(@_)'; be prepared to run into it.)
  my $sequence = shift(); # take the first argument
  my $kmer = shift(); # take the second argument

  # these state variables we'll use to keep track of what we're doing here;
  # $longest_match, a string, will eventually be returned.
  my $longest_matchlen = 0;
  my $longest_match = '';

  # for each match in $sequence of one or more $kmer repeats...
  while ($sequence =~ m@($kmer)+@g) {

    # ...get the length of the match, stored in $1 by the parenthesized
    # capture group, with the '+' quantifier grabbing the longest match 
    # available from each starting point (see `man perlre' for more)...
    my $this_matchlen = length($1);

    # ...and if this match is longer than the longest yet found...
    if ($this_matchlen > $longest_matchlen) {

      # ...store this match's length in $longest_matchlen...
      $longest_matchlen = $this_matchlen;

      # ...and store the match itself in $longest_match.
      $longest_match = $1;

    }; # end of the 'if' statement

  }; # end of the 'while' loop

  # at this point, the longest match we found is in $longest_match; if
  # we found no matches, then $longest_match still contains the empty
  # string we assigned up there before the while loop started, which is
  # the correct result in a case where $kmer never appears in $sequence.
  return $longest_match;
};

你在学习生物信息学,是吗?我有一些向生物信息学家教授 Perl 的经验,并且我认为该领域的编程技能和才能分布非常广泛,在图表的左侧有一个相当不幸的驼峰——这是一种礼貌的说法,作为一名专业程序员,我见过的大多数生物信息学 Perl 代码确实从不太好到很差.

You're studying bioinformatics, aren't you? I have some experience of teaching Perl to bioinformaticians, and I gather there is an extremely broad distribution of programming skill and talent in the field, with a rather unfortunate hump toward the left-hand side of the graph -- a polite way of saying that, as a professional programmer, most of the bioinformatics Perl code I've seen has ranged from not very good to quite poor indeed.

我提到这一点并不是为了侮辱,而是为了证实我非常强烈的建议,即您在目前正在攻读的任何课程中都包含一些计算机科学课程;您对算法的准确公式化所涉及的一般概念和思维习惯的接触越多,您就越能做好应对您所在领域强加的要求的准备——事实上,比大多数人准备得更多,在我的经验;虽然我自己不是生物信息学家,但在与生物信息学家一起工作时,在我看来,强大的编程背景可能比强大的生物学背景对生物信息学家更有用.

I mention this with no intent to give insult, but only to substantiate my very strong recommendation that you include some computer science courses in whatever curriculum you're currently pursuing; the more exposure you can give yourself to the general concepts and habits of thinking which are involved in the accurate formulation of algorithms, the more prepared you'll be to tackle the requirements imposed by your field -- indeed, more prepared than most, in my experience; while I'm not a bioinformatician myself, in working with people who are, it seems to me that a strong programming background may well be more useful to a bioinformatician than a strong background in biology.

这篇关于根据 perl 中的输入查找最长的重复字符串(使用子程序)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆