JavaScript RegEx匹配标点符号不是任何HTML标签的一部分 [英] JavaScript RegEx to match punctuation NOT part of any HTML tags

查看：82 发布时间：2018/6/23 15:28:27 javascript html regex

本文介绍了JavaScript RegEx匹配标点符号不是任何HTML标签的一部分的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

好吧，我知道在RegEx中匹配和解析HTML有很多争议，但我想知道我是否可以得到一些帮助。案例和要点。

我需要匹配任何标点符号，例如。，''但我不想破坏任何HTML，所以理想情况下它应该发生在> 和一个< - 本质上，我的查询不是解析HTML，而是避免它。

我将尝试替换将每个实例包装在一个< span>< / span> 中 - 但在RegEx中绝对没有经验，我不知道我能做到。 / p>

我已经计算出字符集 [\\，\\\\\？\！] 但我不确定如何匹配仅在特定字符之间出现的字符集。任何人都可以帮忙吗？

解决方案

首先，这是一个X浏览器dom解析器函数：

  var parseXML =（function（w，undefined）
 {
'use strict'; 
 var parser，ie = false; 
 switch（true）
 {
 case w.DOMParser！== undefined：
 parser = new w.DOMParser（）; 
 break; 
 case new w.ActiveXObject（Microsoft.XMLDOM）！undefined：
 parser = new w.ActiveXObject（Microsoft.XMLDOM）; 
 parser.async = false; 
 ie = true; 
 break; 
 default：
 throw new Error（'No parser found'）; 
} 
 return function（xmlString）
 { 
 if（ie === true）
 {//返回DOM 
 parser.loadXML（xmlString）; 
返回语法分析器; 
} 
返回语法分析器.parseFromString（xmlString，'text / xml'）; 
}; 
 }）（这个）; 
 //用法：
 var newDom = parseXML（yourString）; 
 var allTags = newDom.getElementsByTagName（'*'）; 
 for（var i = 0; i< allTags.length; i ++）
 {
 if（allTags [i] .tagName.toLowerCase（）==='span'）
 {//如果你想要使用的是跨度：
 if（allTags [i] .hasChildNodes（））
 {
 //这个跨度里面有节点，不要申请正则表达式：
 continue; 
} 
 allTags [i] .innerHTML = allTags [i] .innerHTML.replace（/[.,?!'\"]+/ g，''）; 
} 
}

这样可以帮助你，你仍然可以访问DOM，找到需要过滤/替换的字符串，可以使用 allTags [i] 来引用节点并替换内容。
注意，循环全部元素是不被推荐的，但我真的不想为你做所有的工作;-)你必须检查你正在处理的是什么类型的节点：

  if（allTags [i] .tagName.toLowerCase（）==='span'）
 {//做某些事情
 
 if（allTags [i] .tagName.toLowerCase（）==='html'）
 {//跳过
继续; 
}

此类内容...
请注意，此代码未经测试，但它是我对上一个问题的答案的简化版本。解析器位应该是wor k就好了，实际上这是一个小提琴我已经为其他问题设置了，这也显示你可能想改变这些代码以更好地满足你的需求

Okay, I know there's much controversy with matching and parsing HTML within a RegEx, but I was wondering if I could have some help. Case and Point.

I need to match any punctuation characters e.g . , " ' but I don't want to ruin any HTML, so ideally it should occur between a > and a < - essentially my query isn't so much about parsing HTML, as avoiding it.

I'm going to attempt to replace wrap each instance in a <span></span> - but having absolutely no experience in RegEx, I'm not sure I'm able to do it.

I've figured character sets [\.\,\'\"\?\!] but I'm not sure how to match character sets that only occur between certain characters. Can anybody help?
解决方案
To start off, here's a X-browser dom-parser function:
var parseXML = (function(w,undefined) { 'use strict'; var parser,ie = false; switch (true) { case w.DOMParser !== undefined: parser = new w.DOMParser(); break; case new w.ActiveXObject("Microsoft.XMLDOM") !== undefined: parser = new w.ActiveXObject("Microsoft.XMLDOM"); parser.async = false; ie = true; break; default : throw new Error('No parser found'); } return function(xmlString) { if (ie === true) {//return DOM parser.loadXML(xmlString); return parser; } return parser.parseFromString(xmlString,'text/xml'); }; })(this); //usage: var newDom = parseXML(yourString); var allTags = newDom.getElementsByTagName('*'); for(var i=0;i<allTags.length;i++) { if (allTags[i].tagName.toLowerCase() === 'span') {//if all you want to work with are the spans: if (allTags[i].hasChildNodes()) { //this span has nodes inside, don't apply regex: continue; } allTags[i].innerHTML = allTags[i].innerHTML.replace(/[.,?!'"]+/g,''); } }
This should help you on your way. You still have access to the DOM, so whenever you find a string that needs filtering/replacing, you can reference the node using allTags[i] and replace the contents.
Note that looping through all elements isn't to be recommended, but I didn't really feel like doing all of the work for you ;-). You'll have to check what kind of node you're handling:
if (allTags[i].tagName.toLowerCase() === 'span') {//do certain things } if (allTags[i].tagName.toLowerCase() === 'html') {//skip continue; }
And that sort of stuff...
Note that this code is not tested, but it's a simplified version of my answer to a previous question. The parser-bit should work just fine, in fact here's a fiddle I've set up for that other question, that also shows you how you might want to alter this code to better suite your needs

这篇关于JavaScript RegEx匹配标点符号不是任何HTML标签的一部分的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

JavaScript RegEx匹配标点符号不是任何HTML标签的一部分 [英] JavaScript RegEx to match punctuation NOT part of any HTML tags

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

JavaScript RegEx匹配标点符号不是任何HTML标签的一部分 [英] JavaScript RegEx to match punctuation NOT part of any HTML tags

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭