正则表达式匹配最接近的 <br>中间有一组单词的标签 [英] Regex match closest <br> tags with a group of words in between

查看:28
本文介绍了正则表达式匹配最接近的 <br>中间有一组单词的标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力解决这个问题,但无济于事.我在网上查看了许多资源,有些接近但不准确.假设我有以下代码:

I have been trying to figure this out to no avail. I have looked at many resources online and some get close but not exact. Let's say I have the following code:

<br>
Message 1
<br>
<br>
Here is Message 2
<br>
<br>
Here is Message 2 (again)
<br>

我想要做的是返回所有消息 2 和最近中断标记之间的文本.以下正则表达式很接近:

What I want to do is return all the Message 2's and the text between the closest break tags. The following regex is close:

<br>[\s\S]*?Message 2[\s\S]*?<br>

然而,它返回以下两个块.第 1 块:

However, it returns the following two blocks. Block 1:

<br>
Message 1
<br>
<br>
Here is Message 2
<br>

块 2:

<br>
Here is Message 2 (again)
<br>

但是,我需要块 1 才能返回:

However, I need block 1 to return:

<br>
Here is Message 2
<br>

我收到的消息总是以这种方式呈现,所以我真的认为我不需要 HTML 解析器.

The messages I receive are always presented in this manner so I don't really think I need an HTML parser.

推荐答案

试试这个正则表达式模式:

Try this regex pattern:

<br>((?!<br>)[\s\S])*Message 2((?!<br>)[\s\S])*<br>

演示

我在这里使用的技巧是使用否定前瞻来调整 .* ,它断言后面的不是标记
标签.换句话说,((?!<br>).)* 将消耗所有内容,直到不包括下一个
标签.

The trick I use here is to temper the .* with a negative lookahead which asserts that what follows is not a marker <br> tag. In other words, ((?!<br>).)* will consume everything up to an excluding the next <br> tag.

作为免责声明,一般我们不应该使用正则表达式来解析 HTML 数据.有时,我们被迫这样做,例如如果我们使用像 Notepad++ 这样没有 HTML 解析器的编辑器.

As a disclaimer, in general we should not use regex to parse HTML data. Sometimes, we are force to do this, e.g. if we are using an editor like Notepad++ which doesn't have an HTML parser.

这篇关于正则表达式匹配最接近的 &lt;br&gt;中间有一组单词的标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆