正则表达式 - 匹配除 HTML 标签之外的所有内容 [英] Regex - Match everything except HTML tags

查看:114
本文介绍了正则表达式 - 匹配除 HTML 标签之外的所有内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经搜索过这个,但找不到适合我的解决方案.我需要匹配除 html 标签之外的所有文本的正则表达式模式,因此我可以将其设为西里尔文(这显然会破坏整个 html =))

例如:

text1

<p>text2 <span class="theClass">text3</span></p>

我需要匹配 text1、text2 和 text3,所以类似

preg_match_all("/pattern/", $text, $matches)

然后我会遍历匹配项,或者如果可以使用 preg_replace 来完成,将 text1/2/3 替换为 textA/B/C,那就更好了.

解决方案

您可能知道,正则表达式不是一个很好的选择(这里的一般建议是使用 Dom 解析器).

但是,如果您需要快速的正则表达式解决方案,您可以使用它(请参阅演示):

<[^>]*>(*SKIP)(*F)|[^<]+

这是如何工作的,左边的 <[^>]*> 匹配完整的 ,然后 (*SKIP)(*F) 导致正则表达式失败,引擎前进到字符串中匹配标签最后一个字符之后的位置.

这是一种通用技术的应用,用于从匹配项中排除模式(阅读链接的问题了解更多详情).>

如果您不想让匹配跨越多行,请将 \r\n 添加到进行匹配的否定字符类中,如下所示:

<[^>]*>(*SKIP)(*F)|[^<\r\n]+

I've searched for this but couldn't find a solution that worked for me. I need regex pattern that will match all text except html tags, so I can make it cyrilic (which would obviously ruin the entire html =))

So, for example:

<p>text1</p>
<p>text2 <span class="theClass">text3</span></p>

I need to match text1, text2, and text3, so something like

preg_match_all("/pattern/", $text, $matches)

and then I would just iterate over the matches, or if it can be done with preg_replace, to replace text1/2/3, with textA/B/C, that would be even better.

解决方案

As you probably know, regex is not a great choice for this (the general advice here will be to use a Dom parser).

However, if you needed a quick regex solution, you use this (see demo):

<[^>]*>(*SKIP)(*F)|[^<]+

How this works is that on the left the <[^>]*> matches complete <tags>, then the (*SKIP)(*F) causes the regex to fail and the engine to advance to the position in the string that follows the last character of the matched tag.

This is an application of a general technique to exclude patterns from matches (read the linked question for more details).

If you don't want to allow the matches to span several lines, add \r\n to the negated character class that does your matching, like so:

<[^>]*>(*SKIP)(*F)|[^<\r\n]+

这篇关于正则表达式 - 匹配除 HTML 标签之外的所有内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆