消除我的Perl数组散列中的单位化值 [英] Eliminating unitialized values in my Perl hash of arrays

查看:102
本文介绍了消除我的Perl数组散列中的单位化值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我成功创建了一个数组的哈希,并且正在使用它来计算文件中每个DNA序列的对数得分(

I successfully create a hash of arrays, and I am using it to calculate log-odds scores for each DNA sequence from a file (Creating a hash of arrays for DNA sequences, Perl has input file format). I get a score for each sequence, but I get a warning for each calculation. Naturally, I want to clear up the warning. The warning is: Use of uninitialized value in string eq at line 148.

这是该代码的摘要版本(如有必要,我可以发布完整的代码):

Here is a summarized version of the code (I can post the full code if necessary):

use strict;
use warnings;
use Data::Dumper;

#USER SPECIFICATIONS
print "Please enter the filename of the fasta sequence data: ";
my $filename1 = <STDIN>;

#Remove newline from file
chomp $filename1;

#Open the file and store each dna seq in hash
my %id2seq = ();
my %HoA = ();
my %loscore = ();
my $id = '';
open (FILE, '<', $filename1) or die "Cannot open $filename1.",$!;
my $dna;
while (<FILE>)
{
    if($_ =~ /^>(.+)/)
    {
         $id = $1; #Stores 'Sequence 1' as the first $id, for example
    }
    else
    {
        $HoA{$id} = [ split(//) ]; #Splits the contents to allow for position reference later
        $id2seq{$id} .= $_; #Creates a hash with each seq associated to an id number, used for calculating tables that have been omitted for space
        $loscore{$id} .= 0; #Creates a hash with each id number to have a log-odds score
    }
}
close FILE;

#User specifies motif width
print "Please enter the motif width:\n";
my $width = <STDIN>;

#Remove newline from file
chomp $width;

#Default width is 3 (arbitrary number chosen)
if ($width eq '')
{
    $width = 3;
}

#Omitting code about $width<=0, creation of log-odds score hash to save space

foreach $id (keys %HoA, %loscore)
{
    for my $pos (0..($width-1))
    {
        for my $base (qw( A C G T))
        {
            if ($HoA{$id}[$pos] eq $base) #ERROR OCCURS HERE
            {
                $loscore{$id} += $logodds{$base}[$pos];
            }
            elsif ( ! defined $HoA{$id}[$pos]) 
            {
                print "$pos\n"; 
            }
        }
    }
}
print Dumper(\%loscore);

我得到的输出是:

Use of uninitialized value in string eq at line 148, <STDIN> line 2.
2
(This error repeats 4 times for each position - most likely due to matching to each $base?)

$VAR1 = {
         'Sequence 15' => '-1.27764697876093',
         'Sequence 4' => '0.437512962981119',
         (continues for 29 sequences)
        }

总而言之,我想计算每个序列的对数得分.我有一个对数奇数分数哈希%loscore,其中包含一个基序中每个位置的碱基分数.对数得分是通过将参考值相加得出的.例如,如果log-odds表是

To summarize, I want to calculate the log-odds score of each sequence. I have a log-odds score hash %loscore that contains the score of a base at each location within a motif. The log-odds score is calculated by summing the referenced values. For example, if the log-odds table was

A 4 3 2
C 7 2 1
G 6 9 2
T 1 0 3

序列CAG的对数奇数得分将为7+3+2=12.

The log-odds score of the sequence CAG would be 7+3+2=12.

目前,我认为该错误是由于我将DNA字符串拆分为数组散列的方式而发生的.如前所述,如果您需要所有代码以便可以复制粘贴,则可以提供它.我认为解决方案非常简单,我只需要有人指出正确的方向即可.感谢您提供所有帮助,如有疑问,我可以澄清.另外,任何可以帮助我发布更简洁的问题的技巧都将受到赞赏(我知道这很长,我只想提供足够的背景信息).

At the moment, I believe that the error occurs because of the way I split the strings of DNA to be put into the hash of arrays. As I previously stated, if you want all the code so you can copy-paste, I can provide it. I think the solution is pretty simple, and I just need someone to point me in the right direction. Any and all help is appreciated, and I can clarify as questions arise. Also, any tips that could help me to post more concise questions are appreciated (I know this one is lengthy, I just want to provide enough background information).

推荐答案

这是我用来遍历`%HoA'的代码.它计算每个序列的对数奇数分数,然后遍历每个序列以找到每个序列的最大分数.非常感谢大家的帮助!

Here is the code that I am using to iterate through the `%HoA. It calculates a log-odds score for each sequence, then works through each sequence to find a maximum score for each sequence. Big thanks to everyone for helping out!

foreach $id (keys %HoA)
{
    for my $pos1 (0..length($HoA{$id})-1)
    {
        for my $pos2 ($pos1..$pos1+($width-1))
        {
            for my $base (qw( A C G T))
            {
                if ($HoA{$id}[$pos2] eq $base)
                {
                    for my $pos3 (0..$width-1)
                    {
                        $loscore{$id} += $logodds{$base}[$pos3];

                        if ($loscore{$id} > $maxscore{$id})
                        {
                            $maxscore{$id} = $loscore{$id};
                        }
                    }
                }
                elsif ( ! defined $HoA{$id}[$pos2])
                {
                    print "$pos2\n";
                }
            }
        }
    }
}

这篇关于消除我的Perl数组散列中的单位化值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆