正则表达式值比较 [英] Regex value comparison

查看:118
本文介绍了正则表达式值比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想比较这个样本数据中的两个数字:

I want to compare two numbers isolated from this sample data:

'gi|112807938|emb|CU075707.1|_Xenopus_tropicalis_finished_cDNA,_clone_TNeu129d01  C1:TCONS_00039972(XLOC_025068),_12.9045:32.0354,_Change:1.3118,_p:0.00025,_q:0.50752  C2:TCONS_00045925(XLOC_029835),_10.3694:43.8379,_Change:2.07985,_p:0.0004,_q:0.333824',
'gi|115528274|gb|BC124894.1|_Xenopus_laevis_islet-1,_mRNA_(cDNA_clone_MGC:154537_IMAGE:8320777),_complete_cds C1:TCONS_00080221(XLOC_049570),_17.9027:40.8136,_Change:1.18887,_p:0.00535,_q:0.998852  C2:TCONS_00092192(XLOC_059015),_17.8995:35.5534,_Change:0.990066,_p:0.0355,_q:0.998513',
'gi|118404233|ref|NM_001078963.1|_Xenopus_(Silurana)_tropicalis_pancreatic_lipase-related_protein_2_(pnliprp2),_mRNA  C1:TCONS_00031955(XLOC_019851),_0.944706:5.88717,_Change:2.63964,_p:0.01915,_q:0.998852 C2:TCONS_00036655(XLOC_023660),_2.31819:11.556,_Change:2.31757,_p:0.0358,_q:0.998513',

使用以下正则表达式:

#!/usr/bin/perl -w
use strict; 
use File::Slurp;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;

my (@log_change, @largest_change);
        foreach (@intersect) {
            chomp;
            my @condition1_match = ($_ =~ /C1:.*?Change:(-?\d+\.\d+)|C1:.*?Change:(-?inf)/); # Sometimes the value is 'inf' or '-inf'. This allows either a numerical or inf value to be captured.
            my @condition2_match = ($_ =~ /C2:.*?Change:(-?\d+\.\d+)|C2:.*?Change:(-?inf)/);
            push @log_change, "@condition1_match\t@condition2_match";   
        }

    print Dumper (\@log_change);

这给出了这个输出:

          '1.3118   2.07985 ',
          '1.18887  0.990066 ',
          '2.63964  2.31757 ',

理想情况下,在同一个循环中,我现在想要在 @ condition1_match 中保存的值之间进行比较@ condition2_match 使得较大的值被推送到一个新数组,除非与非数字'inf'进行比较,在这种情况下推数值。

Ideally, within the same loop I now want to make a comparison between the values held in @condition1_match and @condition2_match such that the larger value is pushed onto a new array, unless comparing against a non numerical 'inf' in which case push the numerical value.

这样的事情:

my (@log_change, @largest_change);
        foreach (@intersect) {
            chomp;
            my @condition1_match = ($_ =~ /C1:.*?Change:(-?\d+\.\d+)|C1:.*?Change:(-?inf)/);
            my @condition2_match = ($_ =~ /C2:.*?Change:(-?\d+\.\d+)|C2:.*?Change:(-?inf)/);
            push @log_change, "@condition1_match\t@condition2_match";
                unless ($_ =~ /Change:-?inf/) {
                    if (@condition1_match > @condition2_match) {
                        push @largest_change, @condition1_match;
                    }
                    else {
                        push @largest_change, @condition2_match;
                    }

                }

        }

    print Dumper (\@largest_change);

给出:

          '2.07985',
          undef,
          '0.990066',
          undef,
          '2.31757',
          undef,

以及大量此错误消息:

Use of uninitialized value $condition2_match[1] in join or string at intersect.11.8.pl line 114.

我不确定错误信息到底意味着什么,以及为什么我在 @largest_change 中获得undef值

I'm unsure as to what exactly the error message means, as well as why I'm getting undef values in my @largest_change

推荐答案

在编写代码时, @ condition_match1 @ condition_match2 将创建2个元素 - 对应于正则表达式中的2个捕获组 - 每次匹配时。但其中一个元素必然是 undef ,导致未初始化... 警告。

As you've written your code, @condition_match1 and @condition_match2 will be created with 2 elements -- corresponding to the 2 capture groups in your regular expression -- each time there is a match. But one of these elements will always necessarily be undef, leading to the uninitialized ... warnings.

在这种情况下,您可以通过将 | 放入捕获组内来修复此程序:

In this case, you can repair this program by putting the | inside the capture group:

my ($condition1_match) = ($_ =~ /C1:.*?Change:(-?\d+\.\d+|-?inf)/);
my ($condition2_match) = ($_ =~ /C2:.*?Change:(-?\d+\.\d+|-?inf)/);

因此只有一个捕获组,匹配操作会产生一个包含单个定义元素的列表。

so that there is a single capture group and the matching operation produces a list with a single, defined element.

此外,比较

if (@condition1_match > @condition2_match) {

可能没有按照您的想法行事。在Perl中,两个数组之间的数值比较是数组 length 的比较。你显然要做的是比较每个数组中定义的值,所以你需要做一些比较麻烦的事情:

is probably not doing what you think it is doing. In Perl, a numerical comparison between two arrays is a comparison of array lengths. What you apparently mean to do is to compare the defined value in each of those arrays, so you would need to do something more cumbersome like:

my $condition1_match = $condition1_match[0] // $condition1_match[1];
my $condition2_match = $condition2_match[0] // $condition2_match[1];
if ($condition1_match > $condition2_match) {
    push @largest_change, $condition1_match;
} else {
    push @largest_change, $condition2_match;
}

这篇关于正则表达式值比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆