如何引用哈希数组的哈希值以比较值 [英] How to reference a hash of array of hashes in order to compare values

查看:118
本文介绍了如何引用哈希数组的哈希值以比较值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我具有以下数据结构:

my %hash = (
    'hsa_circ_0024017|chr11:93463035-93463135+|NM_033395|KIAA1731  FORWARD' => [ 
        { 
          'energy' => '-4.3', 
          'spacer' => 'AGGCACC', 
          'end' => '97', 
          'start' => '81' 
        } 
    ],
    'hsa_circ_0067224|chr3:128345575-128345675-|NM_002950|RPN1  FORWARD' => [ 
        { 
          'energy' => '-4.4', 
          'spacer' => 'CAGT', 
          'end' => '17', 
          'start' => '6' 
        }, 
        { 
          'energy' => '-4.1', 
          'spacer' => 'GTT', 
          'end' => '51', 
          'start' => '26' 
        }, 
        { 
          'energy' => '-4.1', 
          'spacer' => 'TTG', 
          'end' => '53', 
          'start' => '28' 
        } 
    ],
    ...
);

如何访问哈希的内容,以便能够在循环中比较内容?

对于每个父哈希(hsa_circ ...),我想将子哈希(空格)一起比较.原谅我,我正在努力说出这项权利.当然,这只是数据的一小部分.简而言之,我的目标是检测具有相同间隔的哈希数组,如果它们确实具有相同的间隔,那么我想选择具有最低能量得分的哈希数组.

解决方案

问题:在每个arrayref中,可能有成组的hashref具有相等的 spacer 值.在每个此类组中,具有 energy 值最低的hashref 需要确定,以替换该组.

大多数工作是在partition_equal()中完成的,该工作可以识别具有相等间隔符的hashref组

use warnings;
use strict;
use List::Util qw(reduce);
use Data::Dump qq(dd);

# Test data: two groups of equal-spacer hashrefs, in the first array only
my %hash = (  
    kA => [
        { 'energy' => -4.3, 'spacer' => 'AGGCACC' },
        { 'energy' => -2.3, 'spacer' => 'AGGCACC' },
        { 'energy' => -3.3, 'spacer' => 'CAGT' },
        { 'energy' => -1.5, 'spacer' => 'GTT' },
        { 'energy' => -2.5, 'spacer' => 'GTT' },
    ],
    kB => [
        { 'energy' => -4.4, 'spacer' => 'CAGT' },
        { 'energy' => -4.1, 'spacer' => 'GTT' },
        { 'energy' => -4.1, 'spacer' => 'TTG' },
    ],
);
#dd \%hash;

for my $key (keys %hash) {
    my ($spv, $unique) = partition_equal($hash{$key});
    next if not $spv;
    # Extract minimum-energy hashref from each group and add to arrayref
    # $unique, so that it can eventually overwrite this key's arrayref
    foreach my $spacer (keys %$spv) {
        my $hr_min = reduce { 
            $a->{energy} < $b->{energy} ? $a : $b 
        } @{$spv->{$spacer}};
        push @$unique, $hr_min;
    }
    # new: unique + lowest-energy ones for each equal-spacer group   
    $hash{$key} = $unique  if keys %$spv;
}    
dd \%hash;

# Sort array and compare neighbouring elements (hashrefs) 
sub partition_equal {
    my $ra = shift;
    my @sr = sort { $a->{spacer} cmp $b->{spacer} } @$ra;

    # %spv:    spacer value => [ hashrefs with it ], ...
    # @unique: hasrefs with unique spacer values    
    my (%spv, @unique);

    # Process first and last separately, to not have to test for them
    ($sr[0]{spacer} eq $sr[1]{spacer})
        ? push @{$spv{$sr[0]{spacer}}}, $sr[0]
        : push @unique, $sr[0];
    for my $i (1..$#sr-1) {
        if ($sr[$i]{spacer} eq $sr[$i-1]{spacer}  or 
            $sr[$i]{spacer} eq $sr[$i+1]{spacer}) 
        {
            push @{$spv{$sr[$i]{spacer}}}, $sr[$i]
        }
        else { push @unique, $sr[$i] }
    }
    ($sr[-1]{spacer} eq $sr[-2]{spacer})
        ? push @{$spv{$sr[-1]{spacer}}}, $sr[-1]
        : push @unique, $sr[-1];

    return if not keys %spv;
    return \%spv, \@unique;
}

输出

kA => [
        { energy => -3.3, spacer => "CAGT" },
        { energy => -2.5, spacer => "GTT" },
        { energy => -4.3, spacer => "AGGCACC" },
      ],
kB => [
        { energy => -4.4, spacer => "CAGT" },
        { energy => -4.1, spacer => "GTT" },
        { energy => -4.1, spacer => "TTG" },
      ],

不维护arrayrefs内部的顺序;新的arrayref具有第一个具有唯一间隔值的hashref,然后是具有最低能量值的hashref(对于每个具有相同间隔值的原始组).

该子项按间隔符值对输入进行排序,因此它可以通过简单地遍历排序后的数组并仅比较邻居来识别相等的值.这应该是相当有效的.

I have the following data structure:

my %hash = (
    'hsa_circ_0024017|chr11:93463035-93463135+|NM_033395|KIAA1731  FORWARD' => [ 
        { 
          'energy' => '-4.3', 
          'spacer' => 'AGGCACC', 
          'end' => '97', 
          'start' => '81' 
        } 
    ],
    'hsa_circ_0067224|chr3:128345575-128345675-|NM_002950|RPN1  FORWARD' => [ 
        { 
          'energy' => '-4.4', 
          'spacer' => 'CAGT', 
          'end' => '17', 
          'start' => '6' 
        }, 
        { 
          'energy' => '-4.1', 
          'spacer' => 'GTT', 
          'end' => '51', 
          'start' => '26' 
        }, 
        { 
          'energy' => '-4.1', 
          'spacer' => 'TTG', 
          'end' => '53', 
          'start' => '28' 
        } 
    ],
    ...
);

How do I access the contents of my hash to be able to compare the contents within a loop?

For each parent hash (hsa_circ...) I want to compare the child hashes (spacers) together. Forgive me I'm struggling to word this right. This is a small sample of the data of course. My goal, in brief, is to detect the arrays of hashes which have the same spacer and if they do have the same spacer then I want to then choose the array of hashes which has the lowest energy score.

解决方案

The problem: there may be groups of hashrefs in each arrayref with the equal spacer value. In each such group the hashref with the lowest energy value need be identified, to replace that group.

Most work is done in partition_equal(), which identifies hashref groups with equal spacers

use warnings;
use strict;
use List::Util qw(reduce);
use Data::Dump qq(dd);

# Test data: two groups of equal-spacer hashrefs, in the first array only
my %hash = (  
    kA => [
        { 'energy' => -4.3, 'spacer' => 'AGGCACC' },
        { 'energy' => -2.3, 'spacer' => 'AGGCACC' },
        { 'energy' => -3.3, 'spacer' => 'CAGT' },
        { 'energy' => -1.5, 'spacer' => 'GTT' },
        { 'energy' => -2.5, 'spacer' => 'GTT' },
    ],
    kB => [
        { 'energy' => -4.4, 'spacer' => 'CAGT' },
        { 'energy' => -4.1, 'spacer' => 'GTT' },
        { 'energy' => -4.1, 'spacer' => 'TTG' },
    ],
);
#dd \%hash;

for my $key (keys %hash) {
    my ($spv, $unique) = partition_equal($hash{$key});
    next if not $spv;
    # Extract minimum-energy hashref from each group and add to arrayref
    # $unique, so that it can eventually overwrite this key's arrayref
    foreach my $spacer (keys %$spv) {
        my $hr_min = reduce { 
            $a->{energy} < $b->{energy} ? $a : $b 
        } @{$spv->{$spacer}};
        push @$unique, $hr_min;
    }
    # new: unique + lowest-energy ones for each equal-spacer group   
    $hash{$key} = $unique  if keys %$spv;
}    
dd \%hash;

# Sort array and compare neighbouring elements (hashrefs) 
sub partition_equal {
    my $ra = shift;
    my @sr = sort { $a->{spacer} cmp $b->{spacer} } @$ra;

    # %spv:    spacer value => [ hashrefs with it ], ...
    # @unique: hasrefs with unique spacer values    
    my (%spv, @unique);

    # Process first and last separately, to not have to test for them
    ($sr[0]{spacer} eq $sr[1]{spacer})
        ? push @{$spv{$sr[0]{spacer}}}, $sr[0]
        : push @unique, $sr[0];
    for my $i (1..$#sr-1) {
        if ($sr[$i]{spacer} eq $sr[$i-1]{spacer}  or 
            $sr[$i]{spacer} eq $sr[$i+1]{spacer}) 
        {
            push @{$spv{$sr[$i]{spacer}}}, $sr[$i]
        }
        else { push @unique, $sr[$i] }
    }
    ($sr[-1]{spacer} eq $sr[-2]{spacer})
        ? push @{$spv{$sr[-1]{spacer}}}, $sr[-1]
        : push @unique, $sr[-1];

    return if not keys %spv;
    return \%spv, \@unique;
}

Output

kA => [
        { energy => -3.3, spacer => "CAGT" },
        { energy => -2.5, spacer => "GTT" },
        { energy => -4.3, spacer => "AGGCACC" },
      ],
kB => [
        { energy => -4.4, spacer => "CAGT" },
        { energy => -4.1, spacer => "GTT" },
        { energy => -4.1, spacer => "TTG" },
      ],

The order inside arrayrefs is not maintained; the new arrayref has first hashrefs with unique spacer values, then those with lowest-energy value (for each original group with same spacer-values).

The sub sorts input by spacer values, so that it can identify equal ones by simply iterating through the sorted array and comparing only neighbors. This should be reasonably efficient.

这篇关于如何引用哈希数组的哈希值以比较值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆