正则表达式替换html标签之外的文本 [英] Regex replace text outside html tags

查看:714
本文介绍了正则表达式替换html标签之外的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个HTML:

 这是简单的html文本< span class ='simple'>简单的简单text text< / span> text

我只需要匹配HTML标签之外的单词。我的意思是,如果我想匹配简单和文本,我应该只从这是简单的html文本和最后一部分文本中得到结果 - 结果将是简单1匹配,文本2火柴。任何人都可以帮助我吗?

  var pattern = new RegExp((\\+ value +\ \b),'gi'); 

if(pattern.test(text)){
text = text.replace(pattern,< span class ='notranslate'> $ 1< / span>);




  • 是我想匹配的词(在本例中为简单)
  • text 这是简单的html文本< span class ='simple'>简单的简单文本文本< / span>文本< / code>
    b

    我需要用< span> 来包装所有选定的单词(在本例中它是简单的)。但是我只想包装任何 HTML标签之外的单词。这个例子的结果应该是< / b>

     这是< span class ='notranslate'>简单< / span> html< span class ='notranslate'>文字< / span> < span class ='simple'>简单的简单文字文字< / span> < span class ='notranslate'>文字< / span> 

    我不想替换

    中的任何文本

     < span class ='simple'>简单的简单文字文字< / span> 

    它应该和之前的替换一样。

    好的,试试用这个正则表达式:

     (text | simple)( [b] 













    $ b

    在regex101上工作的例子



    细分: (#打开捕获组
    文本#匹配'文本'
    |#或者
    简单的#

    $ b

     匹配'简单'
    )#结束捕获组
    (?!#负向前瞻开始(如果内容匹配,将导致匹配失败)
    [^< '<'字符
    >#A>字符
    |#或
    [^<>] *#任意数量的非'&'和非'> ;'字符
    < /#字符<和/
    )#结束负向预测。

    如果 text >或简单位于html标签之间。


    I have this HTML:

    "This is simple html text <span class='simple'>simple simple text text</span> text"
    

    I need to match only words that are outside any HTML tag. I mean if I want to match "simple" and "text" I should get the results only from "This is simple html text" and the last part "text"—the result will be "simple" 1 match, "text" 2 matches. Could anyone help me with this? I’m using jQuery.

    var pattern = new RegExp("(\\b" + value + "\\b)", 'gi');
    
    if (pattern.test(text)) {
        text = text.replace(pattern, "<span class='notranslate'>$1</span>");
    }
    

    • value is the word I want to match (in this case "simple")
    • text is "This is simple html text <span class='simple'>simple simple text text</span> text"

    I need to wrap all selected words (in this example it is "simple") with <span>. But I want to wrap only words that are outside any HTML tags. The result of this example should be

    This is <span class='notranslate'>simple</span> html <span class='notranslate'>text</span> <span class='simple'>simple simple text text</span> <span class='notranslate'>text</span>
    

    I do not want replace any text inside

    <span class='simple'>simple simple text text</span>
    

    It should be the same as before replacement.

    解决方案

    Okay, try using this regex:

    (text|simple)(?![^<]*>|[^<>]*</)
    

    Example worked on regex101.

    Breakdown:

    (         # Open capture group
      text    # Match 'text'
    |         # Or
      simple  # Match 'simple'
    )         # End capture group
    (?!       # Negative lookahead start (will cause match to fail if contents match)
      [^<]*   # Any number of non-'<' characters
      >       # A > character
    |         # Or
      [^<>]*  # Any number of non-'<' and non-'>' characters
      </      # The characters < and /
    )         # End negative lookahead.
    

    The negative lookahead will prevent a match if text or simple is between html tags.

    这篇关于正则表达式替换html标签之外的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆