匹配 html 标签之外的文本 [英] Match text outside of html tags

查看：32 发布时间：2021/9/23 20:18:09 c# html regex

本文介绍了匹配 html 标签之外的文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在有人说出来之前，我知道我应该使用适当的解析器，但对于我的用例，最好使用正则表达式.

Before anyone says it I know I should use a proper parser but for my use case it is better to use a regular expression.

我有以下正则表达式来尝试匹配 html 标签之外的文本:

I have the following regex to try and match text outside of html tags:

(?<!<[^>]*)(?<Text>.+?)

然而，这似乎与标签的左括号相匹配，即 <.我该如何解决这个问题?

However this seems to be matching the opening bracket of the tag, i.e. <. How can I fix this?

示例输入:

<span style="color:blue">some <strong>bold</strong> text</span>

预期:

some bold text

得到:

<some <bold< text<

链接到 RegexStorm.

推荐答案

问题是您使用的 . 匹配任何字符.用否定字符类替换它，例如 [^<>] 匹配除 < 和 > 之外的任何字符，并使用greedy 量词 *(匹配 0 次或多次出现)或 +(匹配 1 次或多次出现):

The problem is that you are using . that matches any character. Replace it with a negated character class, like [^<>] that matches any char but < and > and use a greedy quantifier * (to match 0 or more occurrences) or + (to match 1 or more occurrences):

(?<!<[^>]*)(?<Text>[^<>]*)

见正则表达式演示

顺便说一句，在模式末尾使用 (?.+?) 只会使正则表达式引擎匹配 1 个字符，因为 +? 是一个惰性量词匹配 1 次或多次出现，但尽可能少(因为 1 就足够了，它总是只匹配 1 个字符).通常，在这种懒惰量化的模式之后一定有其他模式，否则，它通常无法获取正确的文本.

BTW, using (?<Text>.+?) at the end of the pattern only makes the regex engine match 1 char since the +? is a lazy quantifier matching 1 or more occurrences but as few as possible (and since 1 is enough, it will always match just 1 char). Usually, there must be some other pattern after such a lazily quantified one, else, it usually does not fetch the right texts.

这篇关于匹配 html 标签之外的文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

匹配 html 标签之外的文本 [英] Match text outside of html tags

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

匹配 html 标签之外的文本 [英] Match text outside of html tags

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭