正则表达式跨行多行 [英] Regex with negative lookahead across multiple lines

查看:259
本文介绍了正则表达式跨行多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在过去的几个小时中,我一直在尝试从以下示例数据中匹配地址,但我无法使其正常工作:

For the past few hours I've been trying to match address(es) from the following sample data and I can't get it to work:

medicalHistory      None
address             24 Lewin Street, KUBURA, 
                NSW, Australia
email               MaryBeor@spambob.com


address             16 Yarra Street, 
                                     LAWRENCE, VIC, Australia
name                Mary   Beor
medicalHistory      None
phone               00000000000000000000353336907
birthday            26-11-1972

我的计划是查找以"address"开头,后跟任何空格,后跟字符,数字逗号和换行符,以换行符结尾,后跟一个字符的任何内容.我提出了以下内容(及其许多变体):

My plan was to find anything that starts with "address", is followed by any space followed by characters, numbers commas and newlines and ends with newline followed by a character. I came up with the following (and many variations of it):

address\s+([0-9a-zA-Z, \n\t]+)(?!\n\w)

不幸的是,符合以下条件:

Unfortunately that matches the following:

address             24 Lewin Street, KUBURA,
                NSW, Australia
email               MaryBeor  

address             16 Yarra Street,
                                 LAWRENCE, VIC, Australia
name                Mary   Beor
medicalHistory      None
phone               00000000000000000000353336907
birthday            26

代替

address             24 Lewin Street, KUBURA, 
                NSW, Australia

address             16 Yarra Street,
                                 LAWRENCE, VIC, Australia

你能告诉我我在做什么错吗?

Can you please tell me what I'm doing wrong?

推荐答案

我会这样做:

address\s+((?![\r\n]+\w)[0-9a-zA-Z, \r\n\t])+

在Regexr上此处查看.

这个((?![\r\n]+\w)[0-9a-zA-Z, \r\n\t])+是重要的部分,如果没有跟随(?![\r\n]+\w),我要说的是匹配[0-9a-zA-Z, \r\n\t]中的下一个字符.这符合您的期望.

This ((?![\r\n]+\w)[0-9a-zA-Z, \r\n\t])+ is the important part, where I say, match the next character from [0-9a-zA-Z, \r\n\t], if (?![\r\n]+\w) is not following. This is matching what you expect.

在这两种情况下,由于字符类中均未包含该字符,因此正则表达式停止匹配.如果您想采用这种方式,则需要结合使用惰性量词正面提前:

In both your cases the regex stopped matching because of a character that is not included in your character class. If you want to go that way than you would need to combine a lazy quantifier and a positive lookahead:

address\s+([0-9a-zA-Z, \n\r\t]+?)(?=\r\w)

[0-9a-zA-Z, \n\r\t]+?的匹配要尽可能少,直到条件(?=\r\w)为真为止.

[0-9a-zA-Z, \n\r\t]+? is matching as less as possible till the condition (?=\r\w) is true.

在Regexr处查看

这篇关于正则表达式跨行多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆