正则表达式匹配段落标签之间的文本 [英] Regex match text between paragraph tags

查看:126
本文介绍了正则表达式匹配段落标签之间的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图只匹配开始/结束段落标签之间的内容.在 RegExr 上使用它,我可以让 <p.*?> 匹配开头段落标记,该标记可能有也可能没有任何其他属性,例如类和/或 ID.>

然而,当我尝试将这种模式添加到积极的背后时,它会中断,我不知道为什么.我试过转义 <> 符号,但这似乎没有帮助.然而,展望未来,效果很好.

以下是整个模式的示例:

(?<=\).*?(?=</p>)

我希望能够仅匹配段落标签内的内容,而不包括标签本身.这就是为什么我试图使用向前看和向后看的原因.

解决方案

问题

使用lookbehinds的问题在于,在大多数正则表达式引擎中,不允许在它们内部使用重复.

(?<=.*)

这是无效的,因为 * 量词.如果它是{8},那也没关系,因为它是固定宽度的.

解决方案

我的建议是匹配所有内容,并使用捕获组和反向引用来处理您的数据.

示例

(.*?)<\/p>

因此,$1\1 将包含您想要的数据.

I'm attempting to match only the content between opening/closing paragraph tags. Playing around with it on RegExr, I can get <p.*?> to match an opening paragraph tag that may or may not have any additional attributes such as class and/or ID.

However, when I attempt to add that pattern to a positive look behind, it breaks and I'm not sure why. I've tried escaping the < and > symbols, but that doesn't seem to help. The look ahead, however, works perfectly.

Here's an example of the entire pattern:

(?<=\<p.*?\>).*?(?=</p>)

I'd like to be able to match only the content within the paragraph tags, and not include the tags themselves. Hence why I was attempting to use look aheads and look behinds.

解决方案

Problem

The problem with using lookbehinds is that in most regex engines, you are not allowed to use repetition inside of them.

(?<=.*)

This is invalid because of the * quantifier. If it was {8}, it would be okay since it is a fixed-width.

Solution

My advice is to match everything, and use capture groups and backreferences to process your data.

Example

<p.*?>(.*?)<\/p>

So, $1 or \1 would contain the data you want.

这篇关于正则表达式匹配段落标签之间的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆