foreach循环不会返回预期的结果 [英] foreach loop not returning expected results

查看:179
本文介绍了foreach循环不会返回预期的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在下面写了一个脚本来分析 bedtools (-tab选项,-name选项),所以如果序列匹配,它可以合并标题。我遇到的问题是,如果序列与多个名称匹配,它只会打印与其对应的名称之一。我想知道是否有人提出了如何解决这个问题的建议。因为我想要序列和名字的位置。有没有通过bedtools的选项?



我的脚本将这两个文件存储到自己的散列中,然后循环,如果它们相等,则假设打印出序列中的匹配适当的名字。它这样做,但如果多个序列对应于名称,它不会出错,它只是不打印它们。所以我的结论是,foreach循环以某种形式失败的语法明智,我没有注意到。有什么建议么?

样本数据:-name输出bedtools

 > sequence_a 
AGGT
> sequence_b
AAAA
> sequence_c
CCCC
> sequence_d
AAAA


示例数据:-tab output bedtools

 > 1-5 
AAAA
> 10-14
ACCT
> 15-19
CCCC

脚本的预期输出

 > sequence_b | 1-5 
AAAA
> sequence_c | 15-19
CCCC
> sequence_d | 1-5
AAAA

脚本

  my%sequence; 

打开(NAMES_FILE,$ ARGV [0])或死无法打开文件:$!;
my $ hash_key_name;
my $ hash_value_name;
while(my $ line =< NAMES_FILE>){
if($ line =〜/ ^>(\S +)/){
$ hash_key_name = $ 1;
}
elsif($ line =〜/ \S /){
chomp $ line;
$ hash_value_name = $ line;
$ sequence {$ hash_key_name} = $ hash_value_name;
}
}


my%sequence_2;
打开(POSITIONS_FILE,$ ARGV [1])或死无法打开文件:$!;
my $ hash_key_pos;
my $ hash_value_pos;
while(my $ line2 =< POSITIONS_FILE>){
if($ line2 =〜/ ^>(\S +)/){
$ hash_key_pos = $ 1;
}
elsif($ line2 =〜/ \S /){
chomp $ line2;
$ hash_value_pos = $ line2;
$ sequence_2 {$ hash_key_pos} = $ hash_value_pos;



$ b foreach $ hash_key_pos(keys%sequence_2){
foreach $ hash_key_name(keys%sequence){
if( $ sequence {$ hash_key_name} eq $ sequence_2 {$ hash_key_pos}){
print> $ hash_key_name | $ hash_key_pos\\\
$ sequence {$ hash_key_name} \ n}
}
}


解决方案

Hashes会高兴地覆盖值,最新的价值,而不会抛出错误。如果你想知道这一点,你需要明确检查一下,看看哈希是否有一个值,然后覆盖它,如: while(my $ line =< NAMES_FILE>){
if($ line =〜/ ^>(\ S +)/){
$ hash_key_name = $ 1;
}
elsif($ line =〜/ \S /){
chomp $ line;
$ hash_value_name = $ line;
if(defined($ sequence {$ hash_key_name})&& $ amp; $ sequence {$ hash_key_name} ne $ hash_value_name){
die(multiple sequences match $ hash_key_name:$ sequence {$ hash_key_name}, $ hash_value_name);
}
$ sequence {$ hash_key_name} = $ hash_value_name;






$ b

这就是说,如果你可以提供产生您想要捕捉的错误的示例数据。它看起来好像上面的数据不应该包含这个错误。

I wrote a script below to analyze two file formats from bedtools (-tab option, -name option) so it could combine the headers if the sequences match. The problem I ran into was that if the sequences matched to multiple names it only printed one of the names that correspond to it. I was wondering if anyone had a suggestion to how to approach this. As I want both the position of the sequence and name. Is there an option through bedtools?

My script stores both files into their own hashes and then loops through then if they are equal it is suppose to print out matches in sequences with the appropriate names. It does this but it does not give errors if multiple sequences correspond to the name, it simply does not print them. So my conclusion is that the foreach loop is failing syntax wise in some form I am not noticing. Any suggestions? Cheers.

sample data: -name output bedtools

     >sequence_a
     AGGT
     >sequence_b
     AAAA
     >sequence_c
     CCCC
     >sequence_d
     AAAA

sample data: -tab output bedtools

    >1-5
    AAAA
    >10-14
    ACCT
    >15-19
    CCCC

expected output from script

    >sequence_b|1-5
    AAAA
    >sequence_c|15-19
    CCCC
    >sequence_d|1-5
    AAAA

Script

my %sequence;

open(NAMES_FILE, $ARGV[0]) or die "Cannot open the file: $!";
my $hash_key_name;
my $hash_value_name;
while (my $line = <NAMES_FILE>) {
    if ($line =~ /^>(\S+)/) {
    $hash_key_name = $1;
    }
    elsif ($line =~ /\S/) {
    chomp $line;
    $hash_value_name = $line;
    $sequence{$hash_key_name} = $hash_value_name;
    }
}


my %sequence_2;
open (POSITIONS_FILE, $ARGV[1]) or die "Cannot open the file: $!";
my $hash_key_pos;
my $hash_value_pos;
while (my $line2 = <POSITIONS_FILE>) {
    if ($line2 =~ /^>(\S+)/) {
    $hash_key_pos = $1;
    }
    elsif ($line2 =~ /\S/) {
    chomp $line2;
    $hash_value_pos = $line2;
    $sequence_2{$hash_key_pos} = $hash_value_pos;
    }
}


foreach $hash_key_pos (keys %sequence_2) {
     foreach $hash_key_name (keys %sequence) {
         if ($sequence{$hash_key_name} eq $sequence_2{$hash_key_pos}){
            print ">$hash_key_name|$hash_key_pos\n$sequence{$hash_key_name}\n"}
    }
} 

解决方案

Hashes will happily overwrite values, saving only the latest value, without throwing errors. If you want to catch that, you will need to put an explicit check in to see if the hash has a value before you overwrite it, something like:

while (my $line = <NAMES_FILE>) {
        if ($line =~ /^>(\S+)/) {
            $hash_key_name = $1;
        }
        elsif ($line =~ /\S/) {
            chomp $line;
            $hash_value_name = $line;
            if (defined($sequence{$hash_key_name}) && $sequence{$hash_key_name} ne $hash_value_name) {
                die("multiple sequences match $hash_key_name: $sequence{$hash_key_name}, $hash_value_name");
            }
            $sequence{$hash_key_name} = $hash_value_name;
        }
}

That being said, it would be most helpful if you could provide sample data which produced the error you are trying to catch. It looks as if the data above should not contain this error.

这篇关于foreach循环不会返回预期的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆