前3个单个项目的Perl多行正则表达式 [英] Perl multiline regex for first 3 individual items
问题描述
我正在尝试在Perl中读取正则表达式格式.有时我会看到一行三行,而不是一行.
I am trying to read a regex format in Perl. Sometimes instead of a single line I also see the format in 3 lines.
对于以下单行格式,我可以将regex替换为
For the below single line format I can regex as
/^\s*(.*)\s+([a-zA-Z0-9._]+)\s+(\d+)\s+(.*)/
获取行中的前3个单个项目
to get the first 3 individual items in line
Hi There FirstName.LastName 10 3/23/2011 2:46 PM
下面是我看到的多行格式.我正在尝试使用类似的东西
Below is the multi-line format I see. I am trying to use something like
/^\s*(.*)\n*\n*|\s+([a-zA-Z0-9._]+)\s+(\d+)\s+(.*)$/m
获取单个物品,但似乎不起作用.
to get individual items but don’t seem to work.
Hi There
FirstName-LastName 8 7/17/2015 1:15 PM
Testing - 12323232323 Hello There
有什么建议吗?多行正则表达式可能吗?
Any suggestions? Is multi-line regex possible?
注意:在同一输出中,我可以看到单行或多行,或者两者都显示,因此输出如下所示
NOTE: In the same output i can see either Single line or Multi line or both so output can be like below
Hello Line1 FirstName.LastName 2011年3月23日下午2:46
Hello Line1 FirstName.LastName 10 3/23/2011 2:46 PM
Hello Line2
Hello Line2
Line2FirstName-LastName 8 7/17/2015 1:15 PM
Testing - 12323232323 Hello There
Hello Line3 Line3FirstName.LastName 2011年3月21日下午2:46
Hello Line3 Line3FirstName.LastName 8 3/21/2011 2:46 PM
推荐答案
您可以确定将正则表达式应用于多行.
You can for sure apply regex over multiple lines.
我已经使用单词之间的否定单词\W+
来匹配单词之间的空格和换行符(实际上\W
等于[^a-zA-Z0-9_]
).
聊天被视为重复的\w+\W+
块.
I've used the negated word \W+
between words to match space and newlines between words (actually \W
is equal to [^a-zA-Z0-9_]
).
The chat is viewed as a repetead \w+\W+
block.
如果您提供更具体的输入/输出案例,我可以完善示例代码:
If you provide more specific input / output case i can refine the example code:
#!/usr/bin/env perl
my $input = <<'__END__';
Hi There
FirstName-LastName 8 7/17/2015 1:15 PM
Testing - 12323232323 Hello There
__END__
my ($chat,$username,$chars,$timestamp) = $input =~ m/(?im)^\s*((?:\w+\W+)+)(\w+[-,\.]\w+)\W+(\d+)\W+([0-1]?\d\/[0-3]?\d\/[1-2]\d{3}\s+[0-2]?\d:[0-5]?\d\s?[ap]m)/;
$chat =~ s/\s+$//; #remove trailing spaces
print "chat -> ${chat}\n";
print "username -> ${username}\n";
print "chars -> ${chars}\n";
print "timestamp -> ${timestamp}\n";
传奇
-
从行首开始
-
m/^.../
匹配正则表达式(不是替代类型) -
(?im)
:不区分大小写的搜索和多行(^/$也匹配行的开始/结束) -
\s*
匹配零个或多个空格字符(匹配空格,制表符,换行符或换页符) -
((?:\w+\W+)+)
(匹配组$ chat)匹配一个或多个由单个单词\w+
(字母,数字,'_')组成的模式,后跟非单词\W+
(不是\w
的所有内容)包括换行符\n
).稍后将其过滤以删除尾随空格 -
(\w+[-,\.]\w+)
:(匹配组$ username)这是我们的弱点.如果用户名不是由用短划线'-'
或逗号','
( UPDATE )或点'.'
分隔的两个正则表达式单词组成,则整个正则表达式将无法正常工作(我已经提取了您的问题的两种可能性均未直接指定). -
(\d+)
:(匹配组$ chars)由一个或多个数字组成的数字 -
([0-1]?\d\/[0-3]?\d\/[1-2]\d{3}\s+[0-2]?\d:[0-5]?\d\s[ap]m)
:(匹配组$ timestamp),此时间长于其他人将其拆分的时间:-
[0-1]?\d\/[0-3]?\d\/[1-2]\d{3}
匹配由月份(带有可选的前导零),日期(带有可选的前导零)和一年(从1000到2999)组成的日期(宽松的限制:) -
[0-2]?\d:[0-5]?\d\s?[ap]m
匹配时间:小时:分钟,可选空间和'pm,PM,am,AM,Am,Pm ...',这要归功于上面的不区分大小写的修饰符
m/^.../
match regex (not substitute type) starting from start of line(?im)
: case insensitive search and multiline (^/$ match start/end of line also)\s*
match zero or more whitespace chars (matches spaces, tabs, line breaks or form feeds)((?:\w+\W+)+)
(match group $chat) match one or more a pattern composed by a single word\w+
(letters, numbers, '_') followed by not words\W+
(everything that is not\w
including newline\n
). This is later filtered to remove trailing whitespaces(\w+[-,\.]\w+)
: (match group $username) this is our weak point. If the username is not composed by two regex words separated by a dash'-'
or a comma','
(UPDATE) or a dot'.'
the entire regex cannot work properly (i've extracted both the possibilities from your question, is not directly specified).(\d+)
: (match group $chars) a number composed by one or more digits([0-1]?\d\/[0-3]?\d\/[1-2]\d{3}\s+[0-2]?\d:[0-5]?\d\s[ap]m)
: (match group $timestamp) this is longer than the others split it up:[0-1]?\d\/[0-3]?\d\/[1-2]\d{3}
match a date composed by month (with an optional leading zero), a day (with an optional leading zero) and a year from 1000 to 2999 (a relaxed constraint :)[0-2]?\d:[0-5]?\d\s?[ap]m
match the time: hour:minutes,optional space and 'pm,PM,am,AM,Am,Pm...' thanks to the case insensitive modifier above
您可以在线此处
这篇关于前3个单个项目的Perl多行正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
-