Perl:为DNA序列的对数得分创建和处理数组的哈希 [英] Perl: Creating and manipulating hash of arrays for log-odds scores of DNA sequences
问题描述
又是我.即使查看文档,我也无法创建数组的哈希.我希望HoA包含DNA序列中一个基序(较小序列)的对数奇数得分.我希望结构看起来像:
it's me again. I am having trouble creating a hash of arrays even after looking at documentation. I want the HoA to contain the log-odds score of a motif (smaller sequence) within a DNA sequence. I want the structure to look like:
$HoA{$id}[$pos] = #score based on the position
其中$id
是序列ID,而$pos
是序列中基序开始的位置.我输入了一个.txt文件,其中包含如下格式的DNA序列:
Where the $id
is the sequence ID and the $pos
is the position within the sequence at which the motif starts. I input a .txt file containing DNA sequences that is formatted as such:
>Sequence_1
TCAGAACCAGTTATAAATTTATCATTTCCTTCTCCACTCCT
>Sequence_2
CCCACGCAGCCGCCCTCCTCCCCGGTCACTGACTGGTCCTG
>Sequence_3
TCGACCCTCTGGAACCTATCAGGGACCACAGTCAGCCAGGCAAG
例如:序列1第2位的基元为'AGA'. 下面是我到目前为止的代码(简化了一点):
For example: a motif at position 2 for Sequence 1 would be 'AGA'. Below is the code I have so far (it is simplified a little):
use strict;
use warnings;
use Data::Dumper;
print "Please enter the filename of the fasta sequence data: ";
my $filename1 = <STDIN>;
#Remove newline from file
chomp $filename1;
#Open the file and store each dna seq in hash
my %HoA = ();
my %loscore = ();
my $id = '';
open (FILE, '<', $filename1) or die "Cannot open $filename1.",$!;
my $dna;
while (<FILE>)
{
if($_ =~ /^>(.+)/)
{
$id = $1; #Stores 'Sequence 1' as the first $id, etc.
}
else
{
$HoA{$id} = [ split(//) ]; #Splits the contents to allow for position reference later
$loscore{$id} .= 0; #Creates a hash with each id number to have a log-odds score (initial score 0)
$maxscore{$id} .= -30; #Creates a hash with each id number to have a maxscore (initial score -30)
}
}
close FILE;
my $width = 3;
my %logodds; #I know there is a better way to do this - this is just for simplicity
$logodds{'A'}[0] = 0.1;
$logodds{'A'}[1] = 0.2;
$logodds{'A'}[2] = 0.3;
$logodds{'C'}[0] = 0.2;
$logodds{'C'}[1] = 0.5;
$logodds{'C'}[2] = 0.2;
$logodds{'G'}[0] = 0.3;
$logodds{'G'}[1] = 0.2;
$logodds{'G'}[2] = 0.4;
$logodds{'T'}[0] = 0.4;
$logodds{'T'}[1] = 0.1;
$logodds{'T'}[2] = 0.1;
print Dumper (\%logodds);
print "\n\n";
for my $base (qw( A C G T))
{
print "logodds$base @{$logodds{$base}}\n";
}
my @arr;
foreach $id (keys %HoA)
{
for my $pos1 (0..length($HoA{$id})-$width-1) #Look through all positions the motif can start at
{
for my $pos2 ($pos1..$pos1+($width-1)) #look through the positions at a specific motif starting point
{
for my $base (qw( A C G T))
{
if ($HoA{$id}[$pos2] eq $base) #If the character matches a base:
{
for my $pos3 (0..$width-1) #for the length of the motif:
{
$arr[$pos1] += $logodds{$base}[$pos3];
@{ $loscore{$id}} = @arr; #Throws error here
}
}
}
}
}
}
print Dumper(\%loscore);
我不断收到错误消息: 在第75行使用"strict refs"时,不能将字符串("0")用作ARRAY ref.
I keep getting the error: Can't use string ("0") as an ARRAY ref while "strict refs" in use at line 75.
使用我想要的数据对数得分的示例是:
An example of a log-odds score with this data that I want is:
$HoA{'Sequence 1'}[2] = 0.1 + 0.2 + 0.3 = 0.6
因此,从序列1的位置2开始的基序"AGA"的对数奇数得分为0.6.感谢您的耐心配合和帮助!让我知道是否需要澄清任何事情.
So, the log-odds score of the motif 'AGA' that begins a position 2 in Sequence 1 is 0.6. I appreciate all of your patience and help! Let me know if I need to clarify anything.
推荐答案
我认为这可以解决问题: 替换
I THINK this solves the problem: Replace
$loscore{$id} .= 0;
$maxscore{$id} .= -30;
$loscore{$id} .= 0;
$maxscore{$id} .= -30;
使用
foreach $id (keys %HoA)
{
for my $len (0..(length($HoA{$id})-$width-1))
{
push @{ $loscore{$id} }, 0;
push @{ $maxscore{$id} }, -30;
}
}
让我知道您是否要添加任何内容.
Let me know if you have anything to add.
这篇关于Perl:为DNA序列的对数得分创建和处理数组的哈希的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!