前3个单个项目的Perl多行正则表达式 [英] Perl multiline regex for first 3 individual items

查看:153
本文介绍了前3个单个项目的Perl多行正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Perl中读取正则表达式格式.有时我会看到一行三行,而不是一行.

I am trying to read a regex format in Perl. Sometimes instead of a single line I also see the format in 3 lines.

对于以下单行格式,我可以将regex替换为

For the below single line format I can regex as

/^\s*(.*)\s+([a-zA-Z0-9._]+)\s+(\d+)\s+(.*)/

获取行中的前3个单个项目

to get the first 3 individual items in line

Hi There       FirstName.LastName    10  3/23/2011 2:46 PM

下面是我看到的多行格式.我正在尝试使用类似的东西

Below is the multi-line format I see. I am trying to use something like

/^\s*(.*)\n*\n*|\s+([a-zA-Z0-9._]+)\s+(\d+)\s+(.*)$/m

获取单个物品,但似乎不起作用.

to get individual items but don’t seem to work.

Hi There    

                         FirstName-LastName       8       7/17/2015 1:15 PM 

Testing - 12323232323 Hello There

有什么建议吗?多行正则表达式可能吗?

Any suggestions? Is multi-line regex possible?

注意:在同一输出中,我可以看到单行或多行,或者两者都显示,因此输出如下所示

NOTE: In the same output i can see either Single line or Multi line or both so output can be like below

Hello Line1 FirstName.LastName 2011年3月23日下午2:46

Hello Line1 FirstName.LastName 10 3/23/2011 2:46 PM

Hello Line2

Hello Line2

                         Line2FirstName-LastName       8       7/17/2015 1:15 PM 

Testing - 12323232323 Hello There

Hello Line3 Line3FirstName.LastName 2011年3月21日下午2:46

Hello Line3 Line3FirstName.LastName 8 3/21/2011 2:46 PM

推荐答案

您可以确定将正则表达式应用于多行.

You can for sure apply regex over multiple lines.

我已经使用单词之间的否定单词\W+来匹配单词之间的空格和换行符(实际上\W等于[^a-zA-Z0-9_]). 聊天被视为重复的\w+\W+块.

I've used the negated word \W+ between words to match space and newlines between words (actually \W is equal to [^a-zA-Z0-9_]). The chat is viewed as a repetead \w+\W+ block.

如果您提供更具体的输入/输出案例,我可以完善示例代码:

If you provide more specific input / output case i can refine the example code:

#!/usr/bin/env perl

my $input = <<'__END__';
Hi There    

                         FirstName-LastName       8       7/17/2015 1:15  PM 

Testing - 12323232323 Hello There
__END__

my ($chat,$username,$chars,$timestamp) = $input =~ m/(?im)^\s*((?:\w+\W+)+)(\w+[-,\.]\w+)\W+(\d+)\W+([0-1]?\d\/[0-3]?\d\/[1-2]\d{3}\s+[0-2]?\d:[0-5]?\d\s?[ap]m)/;

$chat =~ s/\s+$//;  #remove trailing spaces

print "chat -> ${chat}\n";
print "username -> ${username}\n";
print "chars -> ${chars}\n";
print "timestamp -> ${timestamp}\n";

传奇

    从行首开始
  • m/^.../匹配正则表达式(不是替代类型)
  • (?im):不区分大小写的搜索和多行(^/$也匹配行的开始/结束)
  • \s*匹配零个或多个空格字符(匹配空格,制表符,换行符或换页符)
  • ((?:\w+\W+)+)(匹配组$ chat)匹配一个或多个由单个单词\w+(字母,数字,'_')组成的模式,后跟非单词\W+(不是\w的所有内容)包括换行符\n).稍后将其过滤以删除尾随空格
  • (\w+[-,\.]\w+) :(匹配组$ username)这是我们的弱点.如果用户名不是由用短划线'-'或逗号','( UPDATE )或点'.'分隔的两个正则表达式单词组成,则整个正则表达式将无法正常工作(我已经提取了您的问题的两种可能性均未直接指定).
  • (\d+) :(匹配组$ chars)由一个或多个数字组成的数字
  • ([0-1]?\d\/[0-3]?\d\/[1-2]\d{3}\s+[0-2]?\d:[0-5]?\d\s[ap]m) :(匹配组$ timestamp),此时间长于其他人将其拆分的时间:
    • [0-1]?\d\/[0-3]?\d\/[1-2]\d{3}匹配由月份(带有可选的前导零),日期(带有可选的前导零)和一年(从1000到2999)组成的日期(宽松的限制:)
    • [0-2]?\d:[0-5]?\d\s?[ap]m匹配时间:小时:分钟,可选空间和'pm,PM,am,AM,Am,Pm ...',这要归功于上面的不区分大小写的修饰符
    • m/^.../ match regex (not substitute type) starting from start of line
    • (?im): case insensitive search and multiline (^/$ match start/end of line also)
    • \s* match zero or more whitespace chars (matches spaces, tabs, line breaks or form feeds)
    • ((?:\w+\W+)+) (match group $chat) match one or more a pattern composed by a single word \w+ (letters, numbers, '_') followed by not words \W+(everything that is not \w including newline \n). This is later filtered to remove trailing whitespaces
    • (\w+[-,\.]\w+): (match group $username) this is our weak point. If the username is not composed by two regex words separated by a dash '-' or a comma ',' (UPDATE) or a dot '.' the entire regex cannot work properly (i've extracted both the possibilities from your question, is not directly specified).
    • (\d+): (match group $chars) a number composed by one or more digits
    • ([0-1]?\d\/[0-3]?\d\/[1-2]\d{3}\s+[0-2]?\d:[0-5]?\d\s[ap]m): (match group $timestamp) this is longer than the others split it up:
      • [0-1]?\d\/[0-3]?\d\/[1-2]\d{3} match a date composed by month (with an optional leading zero), a day (with an optional leading zero) and a year from 1000 to 2999 (a relaxed constraint :)
      • [0-2]?\d:[0-5]?\d\s?[ap]m match the time: hour:minutes,optional space and 'pm,PM,am,AM,Am,Pm...' thanks to the case insensitive modifier above

      您可以在线此处

      这篇关于前3个单个项目的Perl多行正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆