RegEx将XML转换为JSON [英] RegEx Convert XML to JSON
问题描述
这是上一个问题的延续: 我需要进行转换在Parse.com的云代码上使用JavaScript将XML转换为Json
This is a continuation of a previous question: I need to convert XML to Json in JavaScript on Parse.com's Cloud Code
请不要对此投反对票,因为您不相信RegEx是正确的选择.这就是我的工作.如果您对执行此操作的方法有其他想法,请告诉我.但是它必须在Parse.com的Cloud Code上运行.
Please don't down vote this because you don't believe RegEx is the right choice for this. It's what I have to work with. If you have another idea of a way to do this, please let me know. But it must run on Parse.com's Cloud Code.
原始XML:
<?xml version="1.0" encoding="UTF-8" ?><api><products total-matched="1618" records-returned="1" page-number="1"><product><ad-id>1234</ad-id><supplier-name>Window World</supplier-name><supplier-category>3703703</supplier-category><buy-url>http://website.com</buy-url><currency>USD</currency><description>Window</description><image-url>http://website.com/windowa/80x80.jpg</image-url><in-stock>yes</in-stock><manufacturer-name>Window World</manufacturer-name><name>Half Pain Glass</name><price>31.95</price><retail-price>87.60</retail-price><sale-price>29.95</sale-price><sku>5938</sku><upc></upc></product><product><ad-id>1234</ad-id><supplier-name>Window World</supplier-name><supplier-category>3703703</supplier-category><buy-url>http://website.com</buy-url><currency>USD</currency><description>Window</description><image-url>http://website.com/windowa/80x80.jpg</image-url><in-stock>yes</in-stock><manufacturer-name>Window World</manufacturer-name><name>Half Pain Glass</name><price>31.95</price><retail-price>87.60</retail-price><sale-price>29.95</sale-price><sku>5938</sku><upc></upc></product><product><ad-id>1234</ad-id><supplier-name>Window World</supplier-name><supplier-category>3703703</supplier-category><buy-url>http://website.com</buy-url><currency>USD</currency><description>Window</description><image-url>http://website.com/windowa/80x80.jpg</image-url><in-stock>yes</in-stock><manufacturer-name>Window World</manufacturer-name><name>Half Pain Glass</name><price>31.95</price><retail-price>87.60</retail-price><sale-price>29.95</sale-price><sku>5938</sku><upc></upc></product><product><ad-id>1234</ad-id><supplier-name>Window World</supplier-name><supplier-category>3703703</supplier-category><buy-url>http://website.com</buy-url><currency>USD</currency><description>Window</description><image-url>http://website.com/windowa/80x80.jpg</image-url><in-stock>yes</in-stock><manufacturer-name>Window World</manufacturer-name><name>Half Pain Glass</name><price>31.95</price><retail-price>87.60</retail-price><sale-price>29.95</sale-price><sku>5938</sku><upc></upc></product></products></api>
RegEx代码:
var regex = /(<\w+[^<]*?)\s+([\w-]+)="([^"]+)">/;
while(xml.match(regex)) xml = xml.replace(regex, '<$2>$3</$2>$1>'); // For attributes
xml = xml.replace(/\s/g, ' '). // Finds all the white space converts to single space
replace(/< *\?[^>]*?\? *>/g, ''). //Finds the XML header and removes it
replace(/< *!--[^>]*?-- *>/g, ''). //Finds and removes all comments
replace(/< *(\/?) *(\w[\w-]+\b):(\w[\w-]+\b)/g, '<$1$2_$3').
replace(/< *(\w[\w-]+\b)([^>]*?)\/ *>/g, '< $1$2>').
replace(/(\w[\w-]+\b):(\w[\w-]+\b) *= *"([^>]*?)"/g, '$1_$2="$3"').
replace(/< *(\w[\w-]+\b)((?: *\w[\w-]+ *= *" *[^"]*?")+ *)>( *[^< ]*?\b.*?)< *\/ *\1 *>/g, '< $1$2 value="$3">').
//replace(/ *(\w[\w-]+\b) *= *"([^>]*?)" */g, '< $1>$2').
replace(/< *(\w[\w-]+\b) *</g, '<$1>< ').
replace(/> *>/g, '>').
//replace(/< *\/ *(\w[\w-]+\b) *> *< *\1 *>/g, ''). // breaks the output?
replace(/"/g, '\\"').
replace(/< *(\w[\w-]+\b) *>([^<>]*?)< *\/ *\1 *>/g, '"$1":"$2",').
replace(/< *(\w[\w-]+\b) *>([^<>]*?)< *\/ *\1 *>/g, '"$1":{$2},').
replace(/< *(\w[\w-]+\b) *>(?=.*?< \/\1\},\{)/g, '"$1":[{').
split(/\},\{/).
reverse().
join('},{').
replace(/< *\/ *(\w[\w-]+\b) *>(?=.*?"\1":\[\{)/g, '}],').
split(/\},\{/).
reverse().
join('},{').
replace(/< \/(\w[\w-]+\b)\},\{\1>/g, '},{').
replace(/< *(\w[\w-]+\b)[^>]*?>/g, '"$1":{').
replace(/< *\/ *\w[\w-]+ *>/g,'},').
replace(/\} *,(?= *(\}|\]))/g, '}').
replace(/] *,(?= *(\}|\]))/g, ']').
replace(/" *,(?= *(\}|\]))/g, '"').
replace(/ *, *$/g, '');
输出:
"api": {
"page-number": "1",
"records-returned": "1",
"total-matched": "1618",
"products": {
"product": {
"ad-id": "1234",
"supplier-name": "Window World",
"supplier-category": "3703703",
"buy-url": "http://website.com",
"currency": "USD",
"description": "Window",
"image-url": "http://website.com/windowa/80x80.jpg",
"in-stock": "yes",
"manufacturer-name": "Window World",
"name": "Half Pain Glass",
"price": "31.95",
"retail-price": "87.60",
"sale-price": "29.95",
"sku": "5938",
"upc": ""
},
"product": {
"ad-id": "1234",
"supplier-name": "Window World",
"supplier-category": "3703703",
"buy-url": "http://website.com",
"currency": "USD",
"description": "Window",
"image-url": "http://website.com/windowa/80x80.jpg",
"in-stock": "yes",
"manufacturer-name": "Window World",
"name": "Half Pain Glass",
"price": "31.95",
"retail-price": "87.60",
"sale-price": "29.95",
"sku": "5938",
"upc": ""
},
"product": {
"ad-id": "1234",
"supplier-name": "Window World",
"supplier-category": "3703703",
"buy-url": "http://website.com",
"currency": "USD",
"description": "Window",
"image-url": "http://website.com/windowa/80x80.jpg",
"in-stock": "yes",
"manufacturer-name": "Window World",
"name": "Half Pain Glass",
"price": "31.95",
"retail-price": "87.60",
"sale-price": "29.95",
"sku": "5938",
"upc": ""
},
"product": {
"ad-id": "1234",
"supplier-name": "Window World",
"supplier-category": "3703703",
"buy-url": "http://website.com",
"currency": "USD",
"description": "Window",
"image-url": "http://website.com/windowa/80x80.jpg",
"in-stock": "yes",
"manufacturer-name": "Window World",
"name": "Half Pain Glass",
"price": "31.95",
"retail-price": "87.60",
"sale-price": "29.95",
"sku": "5938",
"upc": ""
}
}
}
我(对此知道)遇到的最后一个问题是,这不会使重复项成为JSON数组.关于如何解决这个问题的任何想法?
The last issue I'm having with this (that I know of) is this doesn't make repeating items a JSON array. Any ideas on how to solve this?
推荐答案
好的,请注意,这是一个快速修复程序,但是似乎可以正常工作.这只会添加一个数组结构,因此您的键不会有好几次(但不会破坏该键).
更改:
Ok so, note that it's a quick fix but nevertheless it seems to work. This will just ADD an array structure so your won't have several times the same key (but it won't destroy that key).
Change:
replace(/< *(\w[\w-]+\b) *>(?=.*?< \/\1\},\{)/g, '"$1":[{').
split(/\},\{/).
reverse().
join('},{').
replace(/< *\/ *(\w[\w-]+\b) *>(?=.*?"\1":\[\{)/g, '}],').
split(/\},\{/).
reverse().
join('},{').
这是尝试实现数组的尝试.
然后放:
which is an attempt to implement arrays.
And put :
replace(/< *(\w[\w-]+\b) *>(?=("\w[\w-]+\b)":\{.*?\},\2)(.*?)< *\/ *\1 *>/, '"$1":[$3],')
请注意,我几乎使用了他的匹配方式.至少看来这对您有用.
Note that I used pretty much his way of matching things. That seemed to work for you example at least.
这篇关于RegEx将XML转换为JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!