检索与Perl中的所有正则表达式完全匹配的模式 [英] retrieve patterns that exactly match all regex in Perl

查看:59
本文介绍了检索与Perl中的所有正则表达式完全匹配的模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个子图数据库,如下所示:

  t#3-231,1v 0 94v 1 14v 2 16v 3 17u 0 1 2u 0 2 20 3 2t#3-232,1v 0 14v 1 94v 2 19v 3 91u 0 1 2u 0 3 2你1 2 2t#3-233,10 171 1版v 2 16v 3 94u 0 1 2u 0 3 2你1 2 2t#3-234,10 901 93版2 1023 95版u 0 1 2u 0 3 2你1 2 2 

我想检索包含以下模式的所有交易:'u 0 1 2' 和 'u 0 2 2' 以及交易 ID(例如,以 t # 开头的行).

我使用以下代码来完成这项工作:

 #!/usr/bin/perl -w使用严格我的$ input = shift @ARGV或死$ !;打开(FILE,"$ input")或死$ !;while(< FILE>){我的@fields =('t','u \ 0 \ 1','u \ 0 \ 2');我的$ field_regex = join("|",@fields);我的@field_lines;push(@field_lines,$ _)if(/^(?:$ field_regex)/);最后如果@field_lines == @fields;推送@field_lines,";打印连接(\n",排序@field_lines);}关闭文件; 

但是,当只有一行匹配时,它将检索模式,例如:

  t#3-231,1u 0 1 2u 0 2 2t#3-232,1u 0 1 2t#3-233,1u 0 1 2t#3-233,1u 0 1 2 

我的最终目标是检索与我的正则表达式完全匹配的交易,例如

  t#3-231,1u 0 1 2u 0 2 2 

谢谢您的帮助!

Olha

解决方案

一种方法:保留当前的transaction-id,并将感兴趣的行存储在与哈希中与该transaction-id密钥相关联的arrayref中.

>

 使用警告;使用严格使用功能说";使用Data :: Dump qw(dd);我的@fields =('u 0 1','u 0 2');我的$ field_regex =加入'|',映射{quotemeta} @fields;我的(%trans,$ tid);而(<>){排骨如果(/^ t#/){$ tid = $ _;下一个;}推送@ {$ trans {$ tid}},如果/$ field_regex/,则$ _;}dd%trans;#foreach我的$ tid(排序键%trans){#说$ tid;#为@ {$ trans {$ tid}}发言;# } 

我使用 while(<>),它在调用程序(或 STDIN )时逐行读取命令行上给出的所有文件,为简便起见.我使用 Data :: Dump 来显示复杂的数据结构;核心是 Data :: Dumper .

qotemeta 转义了所有ASCI非"word"字样,可以抛弃正则表达式的字符,其中包括空格.

上面的程序通常会从文件中丢失事务ID的顺序,因为哈希键是无序的,而由于每个ID在数组中,因此它会保留每个ID的行顺序.如果需要,这并不难补救.

仅使用提供的数据文件进行了测试.

I have a database of subgraphs that looks like this:

t # 3-231, 1
v 0 94
v 1 14
v 2 16
v 3 17
u 0 1 2
u 0 2 2
u 0 3 2
t # 3-232, 1
v 0 14
v 1 94
v 2 19
v 3 91
u 0 1 2
u 0 3 2
u 1 2 2
t # 3-233, 1
v 0 17
v 1 91
v 2 16
v 3 94
u 0 1 2
u 0 3 2
u 1 2 2
t # 3-234, 1
v 0 90
v 1 93
v 2 102
v 3 95
u 0 1 2
u 0 3 2
u 1 2 2

I would like to retrieve all transactions that contains the following patterns: 'u 0 1 2' and 'u 0 2 2' along with transaction id (ex. line starts with t #).

I used the following code to accomplish this job:

#!/usr/bin/perl -w

use strict;

my $input = shift @ARGV or die $!; 

open (FILE, "$input") or die $!;

while (<FILE>) {

my @fields = ('t', 'u\ 0\ 1', 'u\ 0\ 2');  
my $field_regex = join( "|", @fields );
my @field_lines;

    push( @field_lines, $_ ) if ( /^(?:$field_regex) / );
    last if @field_lines == @fields;

push @field_lines, "";

print join( "\n", sort @field_lines );
}

close FILE;

However, it retrieves patterns, when only one line match, such as:

t # 3-231, 1
u 0 1 2
u 0 2 2
t # 3-232, 1
u 0 1 2
t # 3-233, 1
u 0 1 2
t # 3-233, 1
u 0 1 2

My ultimate goal is to retrieve transactions that completely match to my regex, such as

t # 3-231, 1
u 0 1 2
u 0 2 2

Thank you for your help!

Olha

解决方案

One way: keep the current transaction-id on hand, and store lines of interest in an arrayref associated with that transaction-id key in a hash.

use warnings;
use strict;
use feature 'say';    
use Data::Dump qw(dd);

my @fields = ('u 0 1', 'u 0 2');  
my $field_regex = join '|', map { quotemeta } @fields;
    
my (%trans, $tid);

while (<>) {
    chomp;
    if (/^t #/) { 
        $tid = $_; 
        next;
    }   
  
    push @{$trans{$tid}}, $_  if /$field_regex/;
}

dd %trans;

# foreach my $tid (sort keys %trans) { 
#     say $tid;
#     say for @{$trans{$tid}};
# }

I use while (<>) which reads line by line all files given on command line when the program is invoked (or STDIN), for simplicity here. I use Data::Dump to show a complex data structure; there is Data::Dumper in the core for that.

The qotemeta escapes all ASCI non-"word" characters, that can throw off regex, and this includes spaces.

The program above in general loses the order of transaction-id's from the file, since hash keys are unordered, while it keeps the order of lines for each id since those are on an array. This is not hard to remedy if needed.

Tested only with the provided data file.

这篇关于检索与Perl中的所有正则表达式完全匹配的模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆