为DNA序列创建数组的哈希,Perl [英] Creating a hash of arrays for DNA sequences, Perl
问题描述
我有一个称为%id2seq
的哈希,其中包含键$id
引用的DNA序列字符串.我希望能够通过使用字符串中的位置作为参考来操纵DNA序列.例如,如果我的DNA序列是ACGTG
,则我的$id
将是Sequence 1
,我的$id2seq{'Sequence 1'}
将是ACGTG
,而我的理论" $id2seq{'Sequence 1'}[3]
将是G
.
我试图创建一个数组的哈希来做到这一点,但是我得到一个奇怪的输出(见下面的输出).我很确定这只是我的格式.任何输入都会有所帮助,我先感谢您.
I have a hash called %id2seq
that contains strings of DNA sequences that are referenced by the key $id
. I want to be able to manipulate the DNA sequences by using a position within the string as a reference. For example, if my DNA sequence was ACGTG
, my $id
would be Sequence 1
, my $id2seq{'Sequence 1'}
would be ACGTG
, and my "theoretical" $id2seq{'Sequence 1'}[3]
would be G
.
I am attempting to create a hash of arrays to do this, but I'm getting a weird output (see below output). I'm pretty sure that it's just my formatting Any input is helpful, and I appreciate in advance.
以下是输入文件的摘要:
Here is a snippet of the input file:
>Sequence 1
TCAGAACCAGTTATAAATTTATCATTTCCTTCTCCACTCCT
>Sequence 2
CCCACGCAGCCGCCCTCCTCCCCGGTCACTGACTGGTCCTG
>Sequence 3
TCGACCCTCTGGAACCTATCAGGGACCACAGTCAGCCAGGCAAG
这是我目前的尝试的摘要. (我有一个哈希表,访问带有注释掉的DNA序列的文件):
Here is a snippet of my attempt at the moment. (I have a hash table that accesses a file with the DNA sequences commented out):
use strict;
use warnings;
print "Please enter the filename of the fasta sequence data: ";
my $filename1 = <STDIN>;
#Remove newline from file
chomp $filename1;
#Open the file and store each dna seq in hash
my %id2seq = ();
my $id = '';
open (FILE, '<', $filename1) or die "Cannot open $filename1.",$!;
my $dna;
while (<FILE>)
{
if($_ =~ /^>(.+)/)
{
$id = $1;
}
else
{
## $id2seq{$id} = $_; used to create hash table
@seqs = split '', $_;
$id2seq{$id} = [ @seqs ];
}
}
close FILE;
foreach $id (keys %id2seq)
{
print "$id2seq{$id}[@seqs]\n\n";
}
输出
Use of unitialized value in concatenation (.) or string at line 37.
T
G
A
T
T
推荐答案
@seqs
包含最后一个序列中的字符. $id2seq{$id}[@seqs]
实际上表示$id2seq{$id}[N]
,其中N
是最后一个序列的长度.因此,您只能从每个序列中打印一个字符,如果该序列比最后一个序列短,则会收到警告.
@seqs
contains characters from the last sequence. $id2seq{$id}[@seqs]
actually means $id2seq{$id}[N]
where N
is the length of the last sequence. So you print only one character from each sequence and get a warning if that sequence is shorter than the last one.
如果print
仅用于调试,则使用以下命令会更容易:
If you print
only for debugging it is easier with:
use Data::Dumper;
print Dumper(\%id2seq);
否则,您必须在嵌套循环中遍历$id2seq{$id}
自己.
Otherwise you have to iterate over $id2seq{$id}
yourself in a nested loop.
这篇关于为DNA序列创建数组的哈希,Perl的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!