正则表达式替换 html 标签外的文本 [英] Regex replace text outside html tags

查看:54
本文介绍了正则表达式替换 html 标签外的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个 HTML:

"This is simple html text <span class='simple'>simple simple text text</span> text"

我只需要匹配任何 HTML 标签之外的单词.我的意思是如果我想匹配simple"和text",我应该只从This is simple html text"和最后一部分text"中得到结果——结果将是simple" 1 match, text" 2火柴.有人可以帮我解决这个问题吗?我正在使用 jQuery.

I need to match only words that are outside any HTML tag. I mean if I want to match "simple" and "text" I should get the results only from "This is simple html text" and the last part "text"—the result will be "simple" 1 match, "text" 2 matches. Could anyone help me with this? I’m using jQuery.

var pattern = new RegExp("(\b" + value + "\b)", 'gi');

if (pattern.test(text)) {
    text = text.replace(pattern, "<span class='notranslate'>$1</span>");
}

  • value 是我想要匹配的词(在本例中为简单")
  • text"这是简单的 html 文本 <span class='simple'>simple simple text text</span> text"
    • value is the word I want to match (in this case "simple")
    • text is "This is simple html text <span class='simple'>simple simple text text</span> text"
    • 我需要用 包裹所有选定的单词(在这个例子中它是简单的").但我只想包装 any HTML 标签之外的单词.这个例子的结果应该是

      I need to wrap all selected words (in this example it is "simple") with <span>. But I want to wrap only words that are outside any HTML tags. The result of this example should be

      This is <span class='notranslate'>simple</span> html <span class='notranslate'>text</span> <span class='simple'>simple simple text text</span> <span class='notranslate'>text</span>
      

      我不想替换里面的任何文字

      I do not want replace any text inside

      <span class='simple'>simple simple text text</span>
      

      应该和更换前一样.

      推荐答案

      好的,尝试使用这个正则表达式:

      Okay, try using this regex:

      (text|simple)(?![^<]*>|[^<>]*</)
      

      使用 regex101 的示例.

      细分:

      (         # Open capture group
        text    # Match 'text'
      |         # Or
        simple  # Match 'simple'
      )         # End capture group
      (?!       # Negative lookahead start (will cause match to fail if contents match)
        [^<]*   # Any number of non-'<' characters
        >       # A > character
      |         # Or
        [^<>]*  # Any number of non-'<' and non-'>' characters
        </      # The characters < and /
      )         # End negative lookahead.
      

      如果 textsimple 位于 html 标签之间,则否定前瞻将阻止匹配.

      The negative lookahead will prevent a match if text or simple is between html tags.

      这篇关于正则表达式替换 html 标签外的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆