当密钥为动态时,在Perl中对散列进行排序 [英] Sorting a hash in Perl when the keys are dynamic

查看:104
本文介绍了当密钥为动态时,在Perl中对散列进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个散列如下:

  my%data =(
'B2' => {
'one'=> {
timestamp => '00:12:30'
},
'two'=> {
时间戳=> '00:09:30'
}
},
'C3'=> {
'three'=> {
时间戳=< 00:13:45'
},
'adam'=> {
timestamp => '00:09:30'
}
}
);

(结构实际上比这更复杂;我在此简化它。)



我希望在时间戳上对全局进行排序,然后对内部哈希值(一,二,三亚当)的键进行排序。但内部哈希键是动态的;我不知道他们将会是什么样子,直到从文件中读取数据。



我希望上面散列的排序输出是:

  00:09:30,C3,adam 
00:09:30,B2,two
00:12: 30,B2,one
00:13:45,C3,three

查看了许多有关按键和/或值对哈希进行排序的问题/答案,但是当提前未知键名时,我无法弄清楚。 (或者也许我只是不理解它。)



我现在要做的是两步。



  my @flattened; 
为我的$ outer_key(键%data){
为我的$ inner_key(键%{$ data {$ outer_key}}){
push @flattened,[
$ data {$ outer_key} {$ inner_key} {timestamp}
,$ outer_key
,$ inner_key
];


然后进行排序:

  for my $ ary(sort {$ a-> [0] cmp $ b-> [0] || $ a-> [2] cmp $ b-> [2]} @flattened){
print join','=> @ $元;
打印\\\
;
}

我想知道是否有更简洁,优雅,高效的方式这样做吗?

解决方案

这种类型的问题可能更适合程序员堆栈交换站点或代码评审之一。既然它在询问执行情况,我想这里可以问一下。这些网站往往有一些重叠




正如@DondiMichaelStroma指出的那样,正如您已经知道的那样,您的代码非常棒!但是,有多种方法可以做到这一点。对我来说,如果这是一个小脚本,我可能会保持原样并继续进行下一个项目。如果这是在一个更专业的代码库,我会做一些改变。



对于我来说,当写一个专业的代码库时,我会尽量保留一些东西。
$ b


  • 可读性

  • 重要的效率

  • 没有镀金它

  • 单元测试



让我们来看看你的代码:

  my%data =(
'B2'=> {
'one'=> ; {
timestamp => '00:12:30'
},
'two'=> {
timestamp => '00:09:30'


'C3'=> {
'three'=> {
timestamp => '00:13:45'
},
'adam'=> {
timestamp => '00:09:30'
}
}
);

数据的定义方式非常好,格式也很好。这可能不是如何在代码中构建%data ,但也许单元测试会有这样的哈希。

  my @flattened; 
为我的$ outer_key(键%data){
为我的$ inner_key(键%{$ data {$ outer_key}}){
push @flattened,[
$ data {$ outer_key} {$ inner_key} {timestamp}
,$ outer_key
,$ inner_key
];

$ b $ for my $ ary(sort {$ a-> [0] cmp $ b-> [0] || $ a-> [2] cmp $ b-> [2]} @flattened){
print join','=> @ $元;
打印\\\
;
}

变量名称可能更具描述性, @flattened 数组中有一些冗余数据。用 Data :: Dumper 打印它,你可以看到我们有 C3 B2 在多个地方。

  $ VAR1 = [
'00:13:45',
'C3',
'three'
];
$ VAR2 = [
'00:09:30',
'C3',
'adam'
];
$ VAR3 = [
'00:12:30',
'B2',
'one'
];
$ VAR4 = [
'00:09:30',
'B2',
'two'
];

也许这不是什么大不了的事,或者你想保持所有的功能 B2



以下是我们可以存储该数据的另一种方式:

  my%flattened =(
'B2'=> [['one','00:12:30'],
''two','00:09:30']],
'C3'=> [['three','00:13:45'],
['adam', '00:09:30']]
);

它可能会使排序更加复杂,但它使数据结构更简单!也许这更接近镀金,或者您可能会从代码的另一部分中获得这种数据结构。我的首选是保持数据结构简单,并在处理时添加额外的代码。如果您决定需要将%flattened 转储到日志文件,您可能会感激不到重复数据。






实现



设计:我认为我们希望将它作为两种截然不同的操作。这将有助于代码清晰,我们可以单独测试每个函数。第一个函数会在我们想要使用的数据格式之间进行转换,第二个函数会对数据进行排序。这些函数应该在Perl模块中,我们可以使用测试::更多来做单元测试。我不知道我们从哪里调用这些函数,所以让我们假装从 main.pl 调用它们,然后我们可以将这些函数放在一个名为 Helper.pm 。这些名称应该更具描述性,但我不知道应用程序在这里是什么!伟大的名字导致可读的代码。






main.pl



这就是 main.pl 的样子。尽管没有评论,但描述性名称可以使其自行记录。这些名字可能还有待改进!

 #!/ usr / bin / env perl 
use strict;
使用警告;
使用Data :: Dumper;
使用Utilities :: Helper qw(sort_by_times_then_names convert_to_simple_format);

my%data = populate_data();

my @sorted_data = @ {sort_by_times_then_names(convert_to_simple_format(\%data))};

print Dumper(@sorted_data);






Utilities / Helper.pm



这是否可读和优雅?我认为它可以使用一些改进。更多的描述性变量名也可以帮助这个模块。然而,它很容易测试,并且保持了我们的主代码清洁和数据结构的简单。

  package Utilities :: Helper; 
使用strict;
使用警告;

使用Exporter qw(导入);
我们的@EXPORT_OK = qw(sort_by_times_then_names convert_to_simple_format);

#我们可以在这里发表评论,解释预期的输入和输出格式。
sub sort_by_times_then_names {

my($ data_ref)= @_;

#在这里,我们可以使用Schwartzian变换对它进行排序
#通常,我们只是对一个数组进行排序。但是在这里,我们
#将散列转换为一个数组,然后对其进行排序。
#也许应该分成两个步骤才能更清晰!
#my @sorted = map {$ _}我们并不需要这张地图
my @sorted = sort {
$ a-> [2] cmp $ b-> [2]#按时间戳
||排序
$ a-> [1] cmp $ b-> [1]#然后按名称排序
}
map {my $ outer_key = $ _; #convert $ data_ref to a array of arrays
map {#first element is the outer_key
[$ outer_key,@ {$ _}]#second element is the name
}#第三个元素是时间戳
@ {$ data_ref-> {$ _}}
}
键%{$ data_ref};
#如果您希望数组中的元素的顺序不同,
#可以修改上面的代码或在打印时更改它。
返回\ @ sorted;
}


#我们可以在这里发表评论,解释预期的输入和输出格式。
sub convert_to_simple_format {
my($ data_ref)= @_;

my%reformatted_data;

#$ outer_key和$ inner_key可以重新命名以更准确地描述它们代表的数据。
#他们是谁的名字?标识?地方呢?车牌号码?
#也许我们想保持它的通用性,以便这个函数可以处理不同类型的数据。
#我仍然喜欢为这个逻辑使用嵌套循环的想法,因为它清晰直观。我的$ inner_key(键%{$ data_ref-> {$ outer_key}}){
push @ {$ reformatted_data {$ outer_key}},
[$ inner_key,$ data_ref-> {$ outer_key} {$ inner_key} {timestamp}];
}
}

return \%reformatted_data;
}

1;






run_unit_tests.pl



最后,让我们执行一些单元测试。这可能比你在这个问题上寻找的要多,但我认为测试的干净接缝是优雅代码的一部分,我想证明这一点。 测试::更多对此非常好。我甚至会扔一个测试工具和格式化程序,以便我们可以得到一些优雅的输出。如果您没有,可以使用 TAP :: Formatter :: Console TAP :: Formatter :: JUnit

 #!/ usr / bin / env perl 
use strict;
使用警告;
使用TAP ::线束;

my $ harness = TAP :: Harness-> new({
formatter_class =>'TAP :: Formatter :: JUnit',
merge => 1,
详细程度=> 1,
normalize => 1,
color => 1,
timer => 1,
});

$ harness-> runtests('t / helper.t');






t / helper.t



 #!/ usr / bin / env perl 
use strict;
使用警告;
使用Test :: More;
使用Utilities :: Helper qw(sort_by_times_then_names convert_to_simple_format);

my%data =(
'B2'=> {
'one'=> {
timestamp => '00:12:30'
},
'two'=> {
timestamp => '00:09:30'
}
},
'C3' => {
'three'=> {
timestamp => '00:13:45'
},
'adam'=> {
timestamp => '00:09:30'
}
}
);

my%formatted_data =%{convert_to_simple_format(\%data)};

my%expected_formatted_data =(
'B2'=> [['one','00:12:30'],
['two','00: ''']],
'C3'=> ['''''''00:13:45'],
['adam','00:09:30']]
);
$ b is_deeply(\%formatted_data,\%expected_formatted_data,convert_to_simple_format test);

my @sorted_data = @ {sort_by_times_then_names(\%formatted_data)};

我的@expected_sorted_data =(['C3','adam','00:09:30'],
['B2','two','00:09:30 '],
['B2','one','00:12:30'],
['C3','thee','00:13:45'] #indentally typo to演示输出
);
$ b is_deeply(\ @ sorted_data,\ @ expected_sorted_data,sort_by_times_then_names test);

done_testing;






测试输出



测试这种方式的好处是,它会告诉你什么是错误的,当测试失败时。

 <测试包> 
< testsuite failures =1
errors =1
time =0.0478239059448242
tests =2
name =helper_t>
< testcase time =0.0452120304107666
name =1 - convert_to_simple_format test>< / testcase>
< testcase time =0.000266075134277344
name =2 - sort_by_times_then_names test>
< failure type =TestFailed
message =not ok 2 - sort_by_times_then_names test><![CDATA [not o
k 2 - sort_by_times_then_names test

#失败的测试'sort_by_times_then_names测试'
#在t / helper.t行45.
#结构开始不同于:
#$ got-> [3] [1] = 'three'
#$ expected-> [3] [1] ='thee']]>< / failure>
< / testcase>
< testcase time =0.00154280662536621name =(teardown)/>
< system-out><![CDATA [ok 1 - convert_to_simple_format test
not ok 2 - sort_by_times_then_names test

#失败测试'sort_by_times_then_names test'
#b在t / helper.t第45行。
#结构开始不同于:
#$ got-> [3] [1] ='three'
#$ expected-> [3] [1] ='thee'
1..2
]]>< / system-out>
< system-err><![CDATA [Dubious,test returned 1(wstat 256,0x100)
]]>< / system-err>
< error message =可疑,测试返回1(wstat 256,0x100)/>
< / testsuite>
< / testsuites>

总之,我更喜欢可读性和简洁性。有时你可以制作效率较低的代码,这样编写起来更简单,逻辑更简单。将丑陋的代码放入函数中是隐藏它的好方法!运行代码时不需要使用代码来节省15ms。如果你的数据集足够大,以至于性能成为问题,那么Perl可能不适合这项工作。如果您真的想要一些简洁的代码,请在 Code Golf Stack Exchange上发布挑战。


I have a hash as follows:

my %data = (
    'B2' => {
        'one' => {
            timestamp => '00:12:30'
        },
        'two' => {
            timestamp => '00:09:30'
        }
    },
    'C3' => {
        'three' => {
            timestamp => '00:13:45'
        },
        'adam' => {
            timestamp => '00:09:30'
        }
    }
);

(The structure is actually more complex than that; I'm simplifying it here.)

I wish to sort "globally" on timestamp and then the keys of the inner hashes (one, two, three adam). But the keys of the inner hashes are dynamic; I don't know what they are going to be until the data is read from files.

I want the sorted output of the above hash to be:

00:09:30,C3,adam
00:09:30,B2,two
00:12:30,B2,one
00:13:45,C3,three

I've looked at many questions/answers regarding sorting hashes by keys and/or values, but I haven't been able to figure it out when the key names are not known ahead of time. (Or maybe I'm just not understanding it.)

What I'm doing for now is two steps.

Flattening the hash into an array:

my @flattened;
for my $outer_key (keys %data) {
    for my $inner_key (keys %{$data{$outer_key}}) {
        push @flattened, [
            $data{$outer_key}{$inner_key}{timestamp}
            , $outer_key
            , $inner_key
        ];
    }
}

And then doing the sort:

for my $ary (sort { $a->[0] cmp $b->[0] || $a->[2] cmp $b->[2] } @flattened) {
    print join ',' => @$ary;
    print "\n";
}

I'm wondering if there is a more concise, elegant, efficient way of doing this?

解决方案

This type question might be more suited to the Programmers Stack Exchange site or the Code Review one. Since it is asking about implementation, I think its fine to ask here. The sites tend to have some overlap.


As @DondiMichaelStroma pointed out, and as you already know, your code works great! However, there is more than one way to do it. For me, if this was in a small script, I would probably leave it as is and move on to the next part of the project. If this was in a more professional code base, I would make some changes.

For me, when writing for a professional code base, I try to keep a few things in mind.

  • Readability
  • Efficiency when it matters
  • Not gold-plating it
  • Unit Testing

So let's take a look at your code:

my %data = (
    'B2' => {
        'one' => {
            timestamp => '00:12:30'
        },
        'two' => {
            timestamp => '00:09:30'
        }
    },
    'C3' => {
        'three' => {
            timestamp => '00:13:45'
        },
        'adam' => {
            timestamp => '00:09:30'
        }
    }
);

The way data is defined is excellent and nicely formatted. This may not be how %data is built in your code, but maybe a unit test would have a hash like that.

my @flattened;
for my $outer_key (keys %data) {
    for my $inner_key (keys %{$data{$outer_key}}) {
        push @flattened, [
            $data{$outer_key}{$inner_key}{timestamp}
            , $outer_key
            , $inner_key
        ];
    }
}
for my $ary (sort { $a->[0] cmp $b->[0] || $a->[2] cmp $b->[2] } @flattened) {
    print join ',' => @$ary;
    print "\n";
}

The variable names could be more descriptive, and the @flattened array has some redundant data in it. Printing it with Data::Dumper, you can see we have C3 and B2 in multiple places.

$VAR1 = [
          '00:13:45',
          'C3',
          'three'
        ];
$VAR2 = [
          '00:09:30',
          'C3',
          'adam'
        ];
$VAR3 = [
          '00:12:30',
          'B2',
          'one'
        ];
$VAR4 = [
          '00:09:30',
          'B2',
          'two'
        ];

Maybe this isn't a big deal, or maybe you want to keep the functionality of getting all the data under the key B2.

Here's another way we could store that data:

my %flattened = (
    'B2' => [['one', '00:12:30'],
             ['two', '00:09:30']],
    'C3' => [['three','00:13:45'],
             ['adam', '00:09:30']]
);

It may make the sorting more complicated, but it makes the data structure simpler! Maybe this is getting closer to gold-plating, or maybe you'd benefit from this data structure in another part of the code. My preference is to keep data structures simple, and add extra code if needed when processing them. If you decide you need to dump %flattened to a log file, you might appreciate not seeing duplicate data.


Implementation

Design: I think we want to keep this as two distinct operations. This will help code clarity and we could test each function individually. The first function would convert between the data formats we want to use, and the second function would sort the data. These functions should be in a Perl module, and we can use Test::More to do the unit testing. I don't know where we are calling these functions from, so let's pretend we are calling them from main.pl, and we can put the functions in a module called Helper.pm. These names should be more descriptive, but again I'm not sure what the application is here! Great names lead to readable code.


main.pl

This is what main.pl could look like. Even though there are no comments, the descriptive names can make it self documenting. These names could be still be improved too!

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use Utilities::Helper qw(sort_by_times_then_names convert_to_simple_format);

my %data = populate_data();

my @sorted_data = @{ sort_by_times_then_names( convert_to_simple_format( \%data ) ) };

print Dumper(@sorted_data);


Utilities/Helper.pm

Is this readable and elegant? I think it could use some improvements. More descriptive variable names would help in this module as well. However, it is easily testable, and keeps our main code clean and data structures simple.

package Utilities::Helper;
use strict;
use warnings;

use Exporter qw(import);
our @EXPORT_OK = qw(sort_by_times_then_names convert_to_simple_format);

# We could put a comment here explaning the expected input and output formats.
sub sort_by_times_then_names {

    my ( $data_ref ) = @_;

    # Here we can use the Schwartzian Transform to sort it
    # Normally, we would just be sorting an array. But here we
    # are converting the hash into an array and then sorting it.
    # Maybe that should be broken up into two steps to make to more clear!
    #my @sorted = map  { $_ } we don't actually need this map
    my @sorted = sort {
                        $a->[2] cmp $b->[2] # sort by timestamp
                                 ||
                        $a->[1] cmp $b->[1] # then sort by name
                      }
                 map  { my $outer_key=$_;       # convert $data_ref to an array of arrays
                        map {                    # first element is the outer_key
                             [$outer_key, @{$_}] # second element is the name
                            }                    # third element is the timestamp
                            @{$data_ref->{$_}}
                      }
                      keys %{$data_ref};
    # If you want the elements in a different order in the array,
    # you could modify the above code or change it when you print it.
    return \@sorted;
}


# We could put a comment here explaining the expected input and output formats.
sub convert_to_simple_format {
    my ( $data_ref ) = @_;

    my %reformatted_data;

    # $outer_key and $inner_key could be renamed to more accurately describe what the data they are representing.
    # Are they names? IDs? Places? License plate numbers?
    # Maybe we want to keep it generic so this function can handle different kinds of data.
    # I still like the idea of using nested for loops for this logic, because it is clear and intuitive.
    for my $outer_key ( keys %{$data_ref} ) {
        for my $inner_key ( keys %{$data_ref->{$outer_key}} ) {
            push @{$reformatted_data{$outer_key}},
                 [$inner_key, $data_ref->{$outer_key}{$inner_key}{timestamp}];
        }
    }

    return \%reformatted_data;
}

1;


run_unit_tests.pl

Finally, let's implement some unit testing. This is might be more than you were looking for with this question, but I think clean seams for testing is part of elegant code and I want to demonstrate that. Test::More is really great for this. I'll even throw in a test harness and formatter so we can get some elegant output. You can use TAP::Formatter::Console if you don't have TAP::Formatter::JUnit installed.

#!/usr/bin/env perl
use strict;
use warnings;
use TAP::Harness;

my $harness = TAP::Harness->new({
    formatter_class => 'TAP::Formatter::JUnit',
    merge           => 1,
    verbosity       => 1,
    normalize       => 1,
    color           => 1,
    timer           => 1,
});

$harness->runtests('t/helper.t');


t/helper.t

#!/usr/bin/env perl
use strict;
use warnings;
use Test::More;
use Utilities::Helper qw(sort_by_times_then_names convert_to_simple_format);

my %data = (
    'B2' => {
        'one' => {
            timestamp => '00:12:30'
        },
        'two' => {
            timestamp => '00:09:30'
        }
    },
    'C3' => {
        'three' => {
            timestamp => '00:13:45'
        },
        'adam' => {
            timestamp => '00:09:30'
        }
    }
);

my %formatted_data = %{ convert_to_simple_format( \%data ) };

my %expected_formatted_data = (
    'B2' => [['one', '00:12:30'],
             ['two', '00:09:30']],
    'C3' => [['three','00:13:45'],
             ['adam', '00:09:30']]
);

is_deeply(\%formatted_data, \%expected_formatted_data, "convert_to_simple_format test");

my @sorted_data = @{ sort_by_times_then_names( \%formatted_data ) };

my @expected_sorted_data = ( ['C3','adam', '00:09:30'],
                             ['B2','two',  '00:09:30'],
                             ['B2','one',  '00:12:30'],
                             ['C3','thee','00:13:45'] #intentionally typo to demonstrate output
                            );

is_deeply(\@sorted_data, \@expected_sorted_data, "sort_by_times_then_names test");

done_testing;


Test Output

The nice thing about testing this way is that it will tell you what is wrong when a test fails.

<testsuites>
  <testsuite failures="1"
             errors="1"
             time="0.0478239059448242"
             tests="2"
             name="helper_t">
    <testcase time="0.0452120304107666"
              name="1 - convert_to_simple_format test"></testcase>
    <testcase time="0.000266075134277344"
              name="2 - sort_by_times_then_names test">
      <failure type="TestFailed"
               message="not ok 2 - sort_by_times_then_names test"><![CDATA[not o
k 2 - sort_by_times_then_names test

#   Failed test 'sort_by_times_then_names test'
#   at t/helper.t line 45.
#     Structures begin differing at:
#          $got->[3][1] = 'three'
#     $expected->[3][1] = 'thee']]></failure>
    </testcase>
    <testcase time="0.00154280662536621" name="(teardown)" />
    <system-out><![CDATA[ok 1 - convert_to_simple_format test
not ok 2 - sort_by_times_then_names test

#   Failed test 'sort_by_times_then_names test'
#   at t/helper.t line 45.
#     Structures begin differing at:
#          $got->[3][1] = 'three'
#     $expected->[3][1] = 'thee'
1..2
]]></system-out>
    <system-err><![CDATA[Dubious, test returned 1 (wstat 256, 0x100)
]]></system-err>
    <error message="Dubious, test returned 1 (wstat 256, 0x100)" />
  </testsuite>
</testsuites>

In summary, I prefer readable and clear over concise. Sometimes you can make less efficient code that is easier to write and logically simpler. Putting ugly code inside functions is a great way to hide it! It isn't worth messing around with code to save 15ms when you run it. If your data set is large enough that performance becomes an issue, Perl might not be the right tool for the job. If you are really looking for some concise code, post a challenge over at the Code Golf Stack Exchange.

这篇关于当密钥为动态时,在Perl中对散列进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆