正则表达式匹配最接近的 <br>中间有一组单词的标签 [英] Regex match closest <br> tags with a group of words in between
问题描述
我一直在努力解决这个问题,但无济于事.我在网上查看了许多资源,有些接近但不准确.假设我有以下代码:
I have been trying to figure this out to no avail. I have looked at many resources online and some get close but not exact. Let's say I have the following code:
<br>
Message 1
<br>
<br>
Here is Message 2
<br>
<br>
Here is Message 2 (again)
<br>
我想要做的是返回所有消息 2 和最近中断标记之间的文本.以下正则表达式很接近:
What I want to do is return all the Message 2's and the text between the closest break tags. The following regex is close:
<br>[\s\S]*?Message 2[\s\S]*?<br>
然而,它返回以下两个块.第 1 块:
However, it returns the following two blocks. Block 1:
<br>
Message 1
<br>
<br>
Here is Message 2
<br>
块 2:
<br>
Here is Message 2 (again)
<br>
但是,我需要块 1 才能返回:
However, I need block 1 to return:
<br>
Here is Message 2
<br>
我收到的消息总是以这种方式呈现,所以我真的认为我不需要 HTML 解析器.
The messages I receive are always presented in this manner so I don't really think I need an HTML parser.
推荐答案
试试这个正则表达式模式:
Try this regex pattern:
<br>((?!<br>)[\s\S])*Message 2((?!<br>)[\s\S])*<br>
我在这里使用的技巧是使用否定前瞻来调整 .*
,它断言后面的不是标记
标签.换句话说,((?!<br>).)*
将消耗所有内容,直到不包括下一个
标签.
The trick I use here is to temper the .*
with a negative lookahead which asserts that what follows is not a marker <br>
tag. In other words, ((?!<br>).)*
will consume everything up to an excluding the next <br>
tag.
作为免责声明,一般我们不应该使用正则表达式来解析 HTML 数据.有时,我们被迫这样做,例如如果我们使用像 Notepad++ 这样没有 HTML 解析器的编辑器.
As a disclaimer, in general we should not use regex to parse HTML data. Sometimes, we are force to do this, e.g. if we are using an editor like Notepad++ which doesn't have an HTML parser.
这篇关于正则表达式匹配最接近的 <br>中间有一组单词的标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!