JavaScript中严格的HTML解析 [英] Strict HTML parsing in JavaScript

查看：39 发布时间：2021/5/14 20:05:01 javascript html html-parsing

本文介绍了JavaScript中严格的HTML解析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在Google Chrome(金丝雀)上，似乎没有字符串可以使DOM解析器失败.我正在尝试解析一些HTML，但是如果HTML并非完全100％有效，我希望它显示错误.我已经尝试了显而易见的方法:

On Google Chrome (Canary), it seems no string can make the DOM parser fail. I'm trying to parse some HTML, but if the HTML isn't completely, 100%, valid, I want it to display an error. I've tried the obvious:

var newElement = document.createElement('div');
newElement.innerHTML = someMarkup; // Might fail on IE, never on Chrome.

我还尝试了此问题中的方法.不会因为无效标记而失败，即使是我可以产生的最无效的标记.

I've also tried the method in this question. Doesn't fail for invalid markup, even the most invalid markup I can produce.

那么，至少有某种方法可以在Google Chrome浏览器中严格"解析HTML吗?我不想自己动手或使用外部验证实用程序对它进行标记.如果没有其他选择，则可以使用严格的XML解析器，但是某些元素不需要HTML中的结束标记，而且最好不要失败.

So, is there some way to parse HTML "strictly" in Google Chrome at least? I don't want to resort to tokenizing it myself or using an external validation utility. If there's no other alternative, a strict XML parser is fine, but certain elements don't require closing tags in HTML, and preferably those shouldn't fail.

演示: http://jsfiddle.net/q66Ep/1/

/* DOM parser for text/html, see https://stackoverflow.com/a/9251106/938089 */
;(function(DOMParser) {"use strict";var DOMParser_proto=DOMParser.prototype,real_parseFromString=DOMParser_proto.parseFromString;try{if((new DOMParser).parseFromString("", "text/html"))return;}catch(e){}DOMParser_proto.parseFromString=function(markup,type){if(/^\s*text\/html\s*(;|$)/i.test(type)){var doc=document.implementation.createHTMLDocument(""),doc_elt=doc.documentElement,first_elt;doc_elt.innerHTML=markup;first_elt=doc_elt.firstElementChild;if (doc_elt.childElementCount===1&&first_elt.localName.toLowerCase()==="html")doc.replaceChild(first_elt,doc_elt);return doc;}else{return real_parseFromString.apply(this, arguments);}};}(DOMParser));

/*
 * @description              Validate a HTML string
 * @param       String html  The HTML string to be validated 
 * @returns            null  If the string is not wellformed XML
 *                    false  If the string contains an unknown element
 *                     true  If the string satisfies both conditions
 */
function validateHTML(html) {
    var parser = new DOMParser()
      , d = parser.parseFromString('<?xml version="1.0"?>'+html,'text/xml')
      , allnodes;
    if (d.querySelector('parsererror')) {
        console.log('Not welformed HTML (XML)!');
        return null;
    } else {
        /* To use text/html, see https://stackoverflow.com/a/9251106/938089 */
        d = parser.parseFromString(html, 'text/html');
        allnodes = d.getElementsByTagName('*');
        for (var i=allnodes.length-1; i>=0; i--) {
            if (allnodes[i] instanceof HTMLUnknownElement) return false;
        }
    }
    return true; /* The document is syntactically correct, all tags are closed */
}

console.log(validateHTML('<div>'));  //  null, because of the missing close tag
console.log(validateHTML('<x></x>'));// false, because it's not a HTML element
console.log(validateHTML('<a></a>'));//  true, because the tag is closed,
                                     //       and the element is a HTML element

请参见此答案的版本1 ，以了解没有DOMParser的XML验证的替代方法.

See revision 1 of this answer for an alternative to XML validation without the DOMParser.

当前方法完全忽略了文档类型，以进行验证.
此方法在有效的HTML5(因为未关闭标记)的情况下，为 返回 null .>
未检查符合性.



                        这篇关于JavaScript中严格的HTML解析的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

JavaScript中严格的HTML解析 [英] Strict HTML parsing in JavaScript

问题描述

推荐答案

演示: http://jsfiddle.net/q66Ep/1/

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

JavaScript中严格的HTML解析 [英] Strict HTML parsing in JavaScript

问题描述

推荐答案

演示: http://jsfiddle.net/q66Ep/1/

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭