正则表达式跨行多行 [英] Regex with negative lookahead across multiple lines
问题描述
在过去的几个小时中,我一直在尝试从以下示例数据中匹配地址,但我无法使其正常工作:
For the past few hours I've been trying to match address(es) from the following sample data and I can't get it to work:
medicalHistory None
address 24 Lewin Street, KUBURA,
NSW, Australia
email MaryBeor@spambob.com
address 16 Yarra Street,
LAWRENCE, VIC, Australia
name Mary Beor
medicalHistory None
phone 00000000000000000000353336907
birthday 26-11-1972
我的计划是查找以"address"开头,后跟任何空格,后跟字符,数字逗号和换行符,以换行符结尾,后跟一个字符的任何内容.我提出了以下内容(及其许多变体):
My plan was to find anything that starts with "address", is followed by any space followed by characters, numbers commas and newlines and ends with newline followed by a character. I came up with the following (and many variations of it):
address\s+([0-9a-zA-Z, \n\t]+)(?!\n\w)
不幸的是,符合以下条件:
Unfortunately that matches the following:
address 24 Lewin Street, KUBURA,
NSW, Australia
email MaryBeor
和
address 16 Yarra Street,
LAWRENCE, VIC, Australia
name Mary Beor
medicalHistory None
phone 00000000000000000000353336907
birthday 26
代替
address 24 Lewin Street, KUBURA,
NSW, Australia
和
address 16 Yarra Street,
LAWRENCE, VIC, Australia
你能告诉我我在做什么错吗?
Can you please tell me what I'm doing wrong?
推荐答案
我会这样做:
address\s+((?![\r\n]+\w)[0-9a-zA-Z, \r\n\t])+
在Regexr上此处查看.
这个((?![\r\n]+\w)[0-9a-zA-Z, \r\n\t])+
是重要的部分,如果没有跟随(?![\r\n]+\w)
,我要说的是匹配[0-9a-zA-Z, \r\n\t]
中的下一个字符.这符合您的期望.
This ((?![\r\n]+\w)[0-9a-zA-Z, \r\n\t])+
is the important part, where I say, match the next character from [0-9a-zA-Z, \r\n\t]
, if (?![\r\n]+\w)
is not following. This is matching what you expect.
在这两种情况下,由于字符类中均未包含该字符,因此正则表达式停止匹配.如果您想采用这种方式,则需要结合使用惰性量词和正面提前:
In both your cases the regex stopped matching because of a character that is not included in your character class. If you want to go that way than you would need to combine a lazy quantifier and a positive lookahead:
address\s+([0-9a-zA-Z, \n\r\t]+?)(?=\r\w)
[0-9a-zA-Z, \n\r\t]+?
的匹配要尽可能少,直到条件(?=\r\w)
为真为止.
[0-9a-zA-Z, \n\r\t]+?
is matching as less as possible till the condition (?=\r\w)
is true.
这篇关于正则表达式跨行多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!