JavaScript RegEx匹配标点符号不是任何HTML标签的一部分 [英] JavaScript RegEx to match punctuation NOT part of any HTML tags
问题描述
好吧,我知道在RegEx中匹配和解析HTML有很多争议,但我想知道我是否可以得到一些帮助。 案例和要点。
我需要匹配任何标点符号,例如。 ,''
但我不想破坏任何HTML,所以理想情况下它应该发生在>
和一个<
- 本质上,我的查询不是解析HTML,而是避免它。
我将尝试替换将每个实例包装在一个< span>< / span>
中 - 但在RegEx中绝对没有经验,我不知道我能做到。 / p>
我已经计算出字符集 [\\,\\\\\?\!]
但我不确定如何匹配仅在特定字符之间出现的字符集。任何人都可以帮忙吗?
首先,这是一个X浏览器dom解析器函数:
var parseXML =(function(w,undefined)
{
'use strict';
var parser,ie = false;
switch(true)
{
case w.DOMParser!== undefined:
parser = new w.DOMParser();
break;
case new w.ActiveXObject(Microsoft.XMLDOM)!undefined:
parser = new w.ActiveXObject(Microsoft.XMLDOM);
parser.async = false;
ie = true;
break;
default:
throw new Error('No parser found');
}
return function(xmlString)
{
if(ie === true)
{//返回DOM
parser.loadXML(xmlString);
返回语法分析器;
}
返回语法分析器.parseFromString(xmlString,'text / xml');
};
})(这个);
//用法:
var newDom = parseXML(yourString);
var allTags = newDom.getElementsByTagName('*');
for(var i = 0; i< allTags.length; i ++)
{
if(allTags [i] .tagName.toLowerCase()==='span')
{//如果你想要使用的是跨度:
if(allTags [i] .hasChildNodes())
{
//这个跨度里面有节点,不要申请正则表达式:
continue;
}
allTags [i] .innerHTML = allTags [i] .innerHTML.replace(/[.,?!'\"]+/ g,'');
}
}
这样可以帮助你,你仍然可以访问DOM,找到需要过滤/替换的字符串,可以使用 allTags [i]
来引用节点并替换内容。
注意,循环全部元素是不被推荐的,但我真的不想为你做所有的工作;-)你必须检查你正在处理的是什么类型的节点:
if(allTags [i] .tagName.toLowerCase()==='span')
{//做某些事情
if(allTags [i] .tagName.toLowerCase()==='html')
{//跳过
继续;
}
此类内容... Okay, I know there's much controversy with matching and parsing HTML within a RegEx, but I was wondering if I could have some help. Case and Point. I need to match any punctuation characters e.g I'm going to attempt to replace wrap each instance in a I've figured character sets To start off, here's a X-browser dom-parser function: This should help you on your way. You still have access to the DOM, so whenever you find a string that needs filtering/replacing, you can reference the node using And that sort of stuff... 这篇关于JavaScript RegEx匹配标点符号不是任何HTML标签的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
请注意,此代码未经 测试,但它是我对上一个问题的答案的简化版本。解析器位应该是wor k就好了,实际上这是一个小提琴我已经为其他问题设置了,这也显示你可能想改变这些代码以更好地满足你的需求 . , " '
but I don't want to ruin any HTML, so ideally it should occur between a >
and a <
- essentially my query isn't so much about parsing HTML, as avoiding it.<span></span>
- but having absolutely no experience in RegEx, I'm not sure I'm able to do it.[\.\,\'\"\?\!]
but I'm not sure how to match character sets that only occur between certain characters. Can anybody help?var parseXML = (function(w,undefined)
{
'use strict';
var parser,ie = false;
switch (true)
{
case w.DOMParser !== undefined:
parser = new w.DOMParser();
break;
case new w.ActiveXObject("Microsoft.XMLDOM") !== undefined:
parser = new w.ActiveXObject("Microsoft.XMLDOM");
parser.async = false;
ie = true;
break;
default :
throw new Error('No parser found');
}
return function(xmlString)
{
if (ie === true)
{//return DOM
parser.loadXML(xmlString);
return parser;
}
return parser.parseFromString(xmlString,'text/xml');
};
})(this);
//usage:
var newDom = parseXML(yourString);
var allTags = newDom.getElementsByTagName('*');
for(var i=0;i<allTags.length;i++)
{
if (allTags[i].tagName.toLowerCase() === 'span')
{//if all you want to work with are the spans:
if (allTags[i].hasChildNodes())
{
//this span has nodes inside, don't apply regex:
continue;
}
allTags[i].innerHTML = allTags[i].innerHTML.replace(/[.,?!'"]+/g,'');
}
}
allTags[i]
and replace the contents.
Note that looping through all elements isn't to be recommended, but I didn't really feel like doing all of the work for you ;-). You'll have to check what kind of node you're handling:if (allTags[i].tagName.toLowerCase() === 'span')
{//do certain things
}
if (allTags[i].tagName.toLowerCase() === 'html')
{//skip
continue;
}
Note that this code is not tested, but it's a simplified version of my answer to a previous question. The parser-bit should work just fine, in fact here's a fiddle I've set up for that other question, that also shows you how you might want to alter this code to better suite your needs