在Perl中对混合文本行(字母数字)进行排序 [英] Sort mixed text lines (alphanum) in Perl

查看:207
本文介绍了在Perl中对混合文本行(字母数字)进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有txt文件,其每行结构如下:

I have txt file with every line structure like this:

P[containerVrsn:U(0)recordVrsn:U(0)size:U(212)ownGid:G[mdp:U(1090171666)**seqNo:U(81920)**]logicalDbNo:U(1)classVrsn:U(1)timeStamp:U(0)dbRecord:T[classNo:U(1064620)size:U(184)updateVersion:U(3)checksum:U(748981000)

并且必须基于seqNo(最小到最大)对文件行进行排序.序列号实际上可以是任何从零开始的数字.知道如何以有效的方式完成它吗?

And have to sort file lines based on seqNo (min to max). Sequence number can be virtually any number starting from zero. Any idea how can it be done in efficient way?

推荐答案

中建议的 Schwartzian变换 Toto的答案可能是在此处对行进行排序的最快方法.但是您说您是Perl的新手,我想展示如何传统对行进行排序.

The Schwartzian Transform as suggested in Toto's answer is probably the fastest way to sort your lines here. But you said you're a Perl newbie, and I like to show how the lines can be sorted traditionally.

Perl具有 sort函数,该函数仅按字母对列表进行排序.但是您可以提供一个自定义比较功能,并允许sort使用您的函数比较元素.在操作过程中,sort必须连续比较列表中的两个元素(=行),并确定哪个元素更大或更小,或者它们是否相等.

Perl has a sort function that sorts a list simply by alphabet. But you can supply a custom comparison function and let sort use your function to compare the elements. During its operation sort must continuously compare two elements (=lines) of your list and decide which one is greater or lesser or whether they are equal.

如果提供比较功能,sort将使用两个这样的元素(例如参数$a$b)来调用它.您不需要一定不要声明$a$b,它们很神奇,就在那.您的比较功能可能如下所示:

If you supply a comparison function, sort will call it with two such elements as the parameters $a and $b. You do not need to must not declare $a and $b, they are magic and just there. Your comparison function could look like this:

sub by_seqNo
{
    # extract the sequence number from $a and $b
    my ($seqA) = ($a =~ /seqNo:U\((\d+)/);
    my ($seqB) = ($b =~ /seqNo:U\((\d+)/);

    # numerically compare the sequence numbers (returns -1/0/+1)
    $seqA <=> $seqB;
}

前两行提取seqNo:U(之后的数字并将其存储为$seqA$seqB.第三行将这些序列号作为整数进行比较,并返回结果.结合sort函数可以得到:

The first two lines extract the numbers after seqNo:U( and store them as $seqA and $seqB. The third line compares these sequence numbers as integers and returns that result. Combined with the sort function this gives:

my @sorted = sort by_seqNo @lines;

Schwartzian变换(ST)比该解决方案更快的原因是,因为ST进行了(昂贵的)操作,每行仅从您的行中提取seqNo一次.另一方面,传统"方法为每个比较提取两次seqNo.

The reason why the Schwartzian Transform (ST) is faster than this solution is because the ST does the (expensive) operation of extracting the seqNo from your lines exactly once for each line. The "traditional" approach on the other hand extracts the seqNo twice for each comparison.

这篇关于在Perl中对混合文本行(字母数字)进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆