正则表达式跨行多行 [英] Regex with negative lookahead across multiple lines

查看：259 发布时间：2020/5/25 0:23:25 regex parsing lookahead

本文介绍了正则表达式跨行多行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在过去的几个小时中，我一直在尝试从以下示例数据中匹配地址，但我无法使其正常工作:

For the past few hours I've been trying to match address(es) from the following sample data and I can't get it to work:

medicalHistory      None
address             24 Lewin Street, KUBURA, 
                NSW, Australia
email               MaryBeor@spambob.com


address             16 Yarra Street, 
                                     LAWRENCE, VIC, Australia
name                Mary   Beor
medicalHistory      None
phone               00000000000000000000353336907
birthday            26-11-1972

我的计划是查找以"address"开头，后跟任何空格，后跟字符，数字逗号和换行符，以换行符结尾，后跟一个字符的任何内容.我提出了以下内容(及其许多变体):

My plan was to find anything that starts with "address", is followed by any space followed by characters, numbers commas and newlines and ends with newline followed by a character. I came up with the following (and many variations of it):

address\s+([0-9a-zA-Z, \n\t]+)(?!\n\w)

不幸的是，符合以下条件:

Unfortunately that matches the following:

address             24 Lewin Street, KUBURA,
                NSW, Australia
email               MaryBeor

和

address             16 Yarra Street,
                                 LAWRENCE, VIC, Australia
name                Mary   Beor
medicalHistory      None
phone               00000000000000000000353336907
birthday            26

代替

address             24 Lewin Street, KUBURA, 
                NSW, Australia

和

address             16 Yarra Street,
                                 LAWRENCE, VIC, Australia

你能告诉我我在做什么错吗?

Can you please tell me what I'm doing wrong?

推荐答案

我会这样做:

address\s+((?![\r\n]+\w)[0-9a-zA-Z, \r\n\t])+

在Regexr上此处查看.

这个((?![\r\n]+\w)[0-9a-zA-Z, \r\n\t])+是重要的部分，如果没有跟随(?![\r\n]+\w)，我要说的是匹配[0-9a-zA-Z, \r\n\t]中的下一个字符.这符合您的期望.

This ((?![\r\n]+\w)[0-9a-zA-Z, \r\n\t])+ is the important part, where I say, match the next character from [0-9a-zA-Z, \r\n\t], if (?![\r\n]+\w) is not following. This is matching what you expect.

在这两种情况下，由于字符类中均未包含该字符，因此正则表达式停止匹配.如果您想采用这种方式，则需要结合使用惰性量词和正面提前:

In both your cases the regex stopped matching because of a character that is not included in your character class. If you want to go that way than you would need to combine a lazy quantifier and a positive lookahead:

address\s+([0-9a-zA-Z, \n\r\t]+?)(?=\r\w)

[0-9a-zA-Z, \n\r\t]+?的匹配要尽可能少，直到条件(?=\r\w)为真为止.

[0-9a-zA-Z, \n\r\t]+? is matching as less as possible till the condition (?=\r\w) is true.

在Regexr处查看

这篇关于正则表达式跨行多行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式跨行多行 [英] Regex with negative lookahead across multiple lines

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

正则表达式跨行多行 [英] Regex with negative lookahead across multiple lines

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭