将HTML映射到JSON [英] Map HTML to JSON
问题描述
我试图将HTML映射到JSON,并保持结构完整。有没有这样做的图书馆,还是我需要写我自己的图书馆?我想,如果没有html2json库,我可以将xml2json库作为开始。毕竟,html只是xml的一个变种吗?
更新:好的,我应该举个例子。我想要做的是以下几点。解析一个html字符串:
< div>
< span>文字< / span> Text2
< / div>
放入像这样的json对象:
{
type:div,
content:[
{
type:span,
content:[
Text2
]
},
Text2
]
}
注意:如果您没有注意到标签,我正在寻找Javascript解决方案
我只是写了这个函数,做你想做的,试试看,让我知道它是否无法正确工作你:
//用一个元素进行测试。
var initElement = document.getElementsByTagName(html)[0];
var json = mapDOM(initElement,true);
console.log(json);
//用字符串测试。
initElement =< div>< span>文字< / span> Text2< / div>;
json = mapDOM(initElement,true);
console.log(json);
函数mapDOM(element,json){
var treeObject = {};
//如果字符串转换为文档节点
if(typeof element ===string){
if(window.DOMParser){
parser = new的DOMParser();
docNode = parser.parseFromString(element,text / xml);
} else {// Microsoft再次触发
docNode = new ActiveXObject(Microsoft.XMLDOM);
docNode.async = false;
docNode.loadXML(element);
}
element = docNode.firstChild;
}
//递归循环DOM元素并将属性赋值给对象
函数treeHTML(element,object){
object [type] = element。节点名称;
var nodeList = element.childNodes;
if(nodeList!= null){
if(nodeList.length){
object [content] = [];
for(var i = 0; i< nodeList.length; i ++){
if(nodeList [i] .nodeType == 3){
object [content]。push (节点列表[I] .nodeValue);
} else {
object [content]。push({});
treeHTML(nodeList [i],object [content] [object [content]。length -1]);
$ b if(element.attributes!= null){
if(element.attributes.length){
object [attributes] = {};
for(var i = 0; i< element.attributes.length; i ++){
object [attributes] [element.attributes [i] .nodeName] = element.attributes [i] .nodeValue;
}
}
}
}
treeHTML(element,treeObject);
return(json)? JSON.stringify(treeObject):treeObject;
}
工作示例: http://jsfiddle.net/JUSsf/ (在Chrome中测试过,我无法保证完整的浏览器支持 - 您将不得不测试它)。
它创建一个包含HTML页面树形结构的对象,格式为您请求的格式,然后使用 JSON.stringify()
这是包括在大多数现代浏览器(IE8 +,Firefox 3 +等);如果您需要支持旧版浏览器,则可以添加 json2.js 。
它可以包含有效的XHTML作为参数的 DOM元素
或字符串
(我相信,我不确定在某些情况下 DOMParser()
是否会因为设置为text / xml code>或者它只是不提供错误处理。不幸的是,
text / html
浏览器支持不佳。)
通过传递不同的值作为元素
,您可以轻松更改此函数的范围。无论您传递什么值,都将是您的JSON地图的根源。
享受
I'm attempting map HTML into JSON with structure intact. Are there any libraries out there that do this or will I need to write my own? I suppose if there are no html2json libraries out there I could take an xml2json library as a start. After all, html is only a variant of xml anyway right?
UPDATE: Okay, I should probably give an example. What I'm trying to do is the following. Parse a string of html:
<div>
<span>text</span>Text2
</div>
into a json object like so:
{
"type" : "div",
"content" : [
{
"type" : "span",
"content" : [
"Text2"
]
},
"Text2"
]
}
NOTE: In case you didn't notice the tag, I'm looking for a solution in Javascript
I just wrote this function that does what you want, try it out let me know if it doesn't work correctly for you:
// Test with an element.
var initElement = document.getElementsByTagName("html")[0];
var json = mapDOM(initElement, true);
console.log(json);
// Test with a string.
initElement = "<div><span>text</span>Text2</div>";
json = mapDOM(initElement, true);
console.log(json);
function mapDOM(element, json) {
var treeObject = {};
// If string convert to document Node
if (typeof element === "string") {
if (window.DOMParser) {
parser = new DOMParser();
docNode = parser.parseFromString(element,"text/xml");
} else { // Microsoft strikes again
docNode = new ActiveXObject("Microsoft.XMLDOM");
docNode.async = false;
docNode.loadXML(element);
}
element = docNode.firstChild;
}
//Recursively loop through DOM elements and assign properties to object
function treeHTML(element, object) {
object["type"] = element.nodeName;
var nodeList = element.childNodes;
if (nodeList != null) {
if (nodeList.length) {
object["content"] = [];
for (var i = 0; i < nodeList.length; i++) {
if (nodeList[i].nodeType == 3) {
object["content"].push(nodeList[i].nodeValue);
} else {
object["content"].push({});
treeHTML(nodeList[i], object["content"][object["content"].length -1]);
}
}
}
}
if (element.attributes != null) {
if (element.attributes.length) {
object["attributes"] = {};
for (var i = 0; i < element.attributes.length; i++) {
object["attributes"][element.attributes[i].nodeName] = element.attributes[i].nodeValue;
}
}
}
}
treeHTML(element, treeObject);
return (json) ? JSON.stringify(treeObject) : treeObject;
}
Working example: http://jsfiddle.net/JUSsf/ (Tested in chrome, I can't guarantee full browser support - you will have to test this).
It creates an object that contains the tree structure of the HTML page in the format you requested and then uses JSON.stringify()
which is included in most modern browsers (IE8+, Firefox 3+ .etc); If you need to support older browsers you can include json2.js.
It can take either a DOM element
or a string
containing valid XHTML as an argument (I believe, I'm not sure whether the DOMParser()
will choke in certain situations as it is set to "text/xml"
or whether it just doesn't provide error handling. Unfortunately "text/html"
has poor browser support).
You can easily change the range of this function by passing a different value as element
. Whatever value you pass will be the root of your JSON map.
Enjoy
这篇关于将HTML映射到JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!