如何仅使用javascript删除字符串中的html标记 [英] How to remove only html tags in a string using javascript

查看：100 发布时间：2018/6/19 20:11:09 javascript jquery html string

本文介绍了如何仅使用javascript删除字符串中的html标记的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用javascript从给定的字符串中删除html标签。当前的解决方案

（1）当前的解决方案

使用javascript，创建虚拟div标签并获取文本

function remove_tags（html） { var tmp = document.createElement（DIV）; tmp.innerHTML = html; 返回tmp.textContent || tmp.innerText; （2）使用正则表达式 function remove_tags（html） { return html.replace（/<（？：。| \\\ ）*？> / gm ，''）; $ b （3）使用JQuery function remove_tags（html） { return jQuery（html）.text（）; $ / code> 这三个解决方案工作正常，但如果字符串是这样的 p> < div>你好<你好！ < / DIV> 剥离的字符串就像 hello 。但我只需要删除html标签。例如 hello< hi all！> 编辑：背景是，我想删除所有用户输入特定文本区域的html标签。但我想让用户输入< hi all> 种文本。在目前的方法中，它删除了<>内包含的任何内容。使用正则表达式可能不是解析方案

一个问题，如果你考虑一个不同的方法。例如，查找所有标签，然后检查标签名称是否与定义的有效HTML标签名称列表匹配：

  var protos = document.body.constructor === window.HTMLBodyElement; 
 validHTMLTags = / ^（?: a | abbr | acronym | address | applet | area | article | aside | audio | b | base | basefont | bdi | bdo | bgsound | big | blink | blockquote | body | br |按钮|帆布|标题|中心|引用|码|山口| COLGROUP |数据| DataList控件| DD |删除|信息| DFN |目录|格| DL | DT | EM |嵌入|字段集| figcaption |图|字体|页脚|形式|帧|框架| H1 | H2 | H3 | H4 | H5 | H6 |头|头| hgroup |小时| HTML | I |的iframe | IMG |输入|插件| ISINDEX | KBD |密钥生成|标签|说明|利|友情链接|房源|主要|地图|标记|帐篷|菜单|菜单项|元|仪表|导航| NOBR |无框架|无脚本|对象| OL | OPTGROUP |选项|输出| p | PARAM |明文|预|进展| q | RP |实时|红宝石| S |桑普|脚本|部分|选|小|源|间隔|跨度|打击|强|风格|分|摘要|燮|表| TBODY | TD |文本区域| TFOOT |日| THEAD |时间|标题| TR |轨道| TT | U | UL | VAR |视频| WBR | XMP）$ / I; 
 
函数sanitize（txt）{
 var //这个正则表达式标准化引号之间的任何内容
 normaliseQuotes = / =（[']）（？= [^ \1] * [\\\1] * \ 1 / g，
 normaliseFn = function（$ 0，q，sym）{
 return $ 0.replace（/ }，
 replaceInvalid = function（$ 0，tag，off，txt）{
 var 
 //它是一个有效的标记吗？
 invalidTag = protos&& 
 document.createElement（tag）instanceof HTMLUnknownElement 
 ||！validHTMLTags.test（tag ），
 
 //标记是否完成？
 isComplete = txt.slice（off + 1）.search（/ ^ [^ +] +> -1 ; 
 
 return invalidTag ||！isComplete？'& lt;'+ tag：$ 0; 
}; 
 
 txt = txt.replace（normaliseQuotes，normaliseFn ）
 .replace（/<（\\ \\ w +）/ g，replaceInvalid）; 
 
 var tmp = document.createElement（DIV）; 
 tmp.innerHTML = txt; 
 
在tmp中返回textContent？ tmp.textContent：tmp.innerHTML; 
}

工作演示： http://jsfiddle.net/m9vZg/3/

这可以起作用，因为如果浏览器不是匹配的'<'开始标记的一部分，则浏览器将文本解析为文本。它不会遇到与尝试使用正则表达式解析HTML标记相同的问题，因为您只查找开始分隔符和标记名称，其他所有内容都无关紧要。

 
 
 它也是未来证明：WebIDL规范告诉供应商如何为HTML元素实现原型，因此我们尝试从当前匹配标签创建HTML元素。如果元素是 HTMLUnknownElement 的实例，我们知道它不是有效的HTML标记。  validHTMLTags 正则表达式为旧版浏览器（如IE 6和7）定义了一个HTML标记列表，它们不会实现这些原型。
 
I want to remove html tags from given string using javascript. I looked into current approaches but there are some unsolved problems occured with them.

Current solutions

(1) Using javascript, creating virtual div tag and get the text
  function remove_tags(html)
  {
       var tmp = document.createElement("DIV");
       tmp.innerHTML = html; 
       return tmp.textContent||tmp.innerText; 
  }
(2) Using regex
  function remove_tags(html)
  {
       return html.replace(/<(?:.|\n)*?>/gm, '');
  }
(3) Using JQuery
  function remove_tags(html)
  {
       return jQuery(html).text();
  }
These three solutions are working correctly, but if the string is like this
  <div> hello <hi all !> </div>
stripped string is like 
      hello . But I need only remove html tags only. like hello <hi all !>

Edited: Background is, I want to remove all the user input html tags for a particular text area. But I want to allow users to enter <hi all> kind of text. In current approach, its remove any content which include within <>.
 解决方案 
Using a regex might not be a problem if you consider a different approach.  For instance, looking for all tags, and then checking to see if the tag name matches a list of defined, valid HTML tag names: 
var protos = document.body.constructor === window.HTMLBodyElement;
    validHTMLTags  =/^(?:a|abbr|acronym|address|applet|area|article|aside|audio|b|base|basefont|bdi|bdo|bgsound|big|blink|blockquote|body|br|button|canvas|caption|center|cite|code|col|colgroup|data|datalist|dd|del|details|dfn|dir|div|dl|dt|em|embed|fieldset|figcaption|figure|font|footer|form|frame|frameset|h1|h2|h3|h4|h5|h6|head|header|hgroup|hr|html|i|iframe|img|input|ins|isindex|kbd|keygen|label|legend|li|link|listing|main|map|mark|marquee|menu|menuitem|meta|meter|nav|nobr|noframes|noscript|object|ol|optgroup|option|output|p|param|plaintext|pre|progress|q|rp|rt|ruby|s|samp|script|section|select|small|source|spacer|span|strike|strong|style|sub|summary|sup|table|tbody|td|textarea|tfoot|th|thead|time|title|tr|track|tt|u|ul|var|video|wbr|xmp)$/i;

function sanitize(txt) {
    var // This regex normalises anything between quotes
        normaliseQuotes = /=(["'])(?=[^\1]*[<>])[^\1]*\1/g,
        normaliseFn = function ($0, q, sym) { 
            return $0.replace(/</g, '&lt;').replace(/>/g, '&gt;'); 
        },
        replaceInvalid = function ($0, tag, off, txt) {
            var 
                // Is it a valid tag?
                invalidTag = protos && 
                    document.createElement(tag) instanceof HTMLUnknownElement
                    || !validHTMLTags.test(tag),

                // Is the tag complete?
                isComplete = txt.slice(off+1).search(/^[^<]+>/) > -1;

            return invalidTag || !isComplete ? '&lt;' + tag : $0;
        };

    txt = txt.replace(normaliseQuotes, normaliseFn)
             .replace(/<(\w+)/g, replaceInvalid);

    var tmp = document.createElement("DIV");
    tmp.innerHTML = txt;

    return "textContent" in tmp ? tmp.textContent : tmp.innerHTML;
}



  Working Demo: http://jsfiddle.net/m9vZg/3/
This works because browsers parse '>' as text if it isn't part of a matching '<' opening tag.  It doesn't suffer the same problems as trying to parse HTML tags using a regular expression, because you're only looking for the opening delimiter and the tag name, everything else is irrelevant.

It's also future proof: the WebIDL specification tells vendors how to implement prototypes for HTML elements, so we try and create a HTML element from the current matching tag.  If the element is an instance of HTMLUnknownElement, we know that it's not a valid HTML tag.   The validHTMLTags regular expression defines a list of HTML tags for older browsers, such as IE 6 and 7, that do not implement these prototypes.

                        这篇关于如何仅使用javascript删除字符串中的html标记的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何仅使用javascript删除字符串中的html标记 [英] How to remove only html tags in a string using javascript

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

如何仅使用javascript删除字符串中的html标记 [英] How to remove only html tags in a string using javascript

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭