JavaScript RegEx匹配标点符号不是任何HTML标签的一部分 [英] JavaScript RegEx to match punctuation NOT part of any HTML tags

查看:82
本文介绍了JavaScript RegEx匹配标点符号不是任何HTML标签的一部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,我知道在RegEx中匹配和解析HTML有很多争议,但我想知道我是否可以得到一些帮助。 案例和要点



我需要匹配任何标点符号,例如。 ,''但我不想破坏任何HTML,所以理想情况下它应该发生在> 和一个< - 本质上,我的查询不是解析HTML,而是避免它。



我将尝试替换将每个实例包装在一个< span>< / span> 中 - 但在RegEx中绝对没有经验,我不知道我能做到。 / p>

我已经计算出字符集 [\\,\\\\\?\!] 但我不确定如何匹配仅在特定字符之间出现的字符集。任何人都可以帮忙吗?

解决方案

首先,这是一个X浏览器dom解析器函数:

  var parseXML =(function(w,undefined)
{
'use strict';
var parser,ie = false;
switch(true)
{
case w.DOMParser!== undefined:
parser = new w.DOMParser();
break;
case new w.ActiveXObject(Microsoft.XMLDOM)!undefined:
parser = new w.ActiveXObject(Microsoft.XMLDOM);
parser.async = false;
ie = true;
break;
default:
throw new Error('No parser found');
}
return function(xmlString)
{
if(ie === true)
{//返回DOM
parser.loadXML(xmlString);
返回语法分析器;
}
返回语法分析器.parseFromString(xmlString,'text / xml');
};
})(这个);
//用法:
var newDom = parseXML(yourString);
var allTags = newDom.getElementsByTagName('*');
for(var i = 0; i< allTags.length; i ++)
{
if(allTags [i] .tagName.toLowerCase()==='span')
{//如果你想要使用的是跨度:
if(allTags [i] .hasChildNodes())
{
//这个跨度里面有节点,不要申请正则表达式:
continue;
}
allTags [i] .innerHTML = allTags [i] .innerHTML.replace(/[.,?!'\"]+/ g,'');
}
}

这样可以帮助你,你仍然可以访问DOM,找到需要过滤/替换的字符串,可以使用 allTags [i] 来引用节点并替换内容。
注意,循环全部元素是不被推荐的,但我真的不想为你做所有的工作;-)你必须检查你正在处理的是什么类型的节点:

  if(allTags [i] .tagName.toLowerCase()==='span')
{//做某些事情

if(allTags [i] .tagName.toLowerCase()==='html')
{//跳过
继续;
}

此类内容...
请注意,此代码未经 测试,但它是我对上一个问题的答案的简化版本。解析器位应该是wor k就好了,实际上这是一个小提琴我已经为其他问题设置了,这也显示你可能想改变这些代码以更好地满足你的需求

Okay, I know there's much controversy with matching and parsing HTML within a RegEx, but I was wondering if I could have some help. Case and Point.

I need to match any punctuation characters e.g . , " ' but I don't want to ruin any HTML, so ideally it should occur between a > and a < - essentially my query isn't so much about parsing HTML, as avoiding it.

I'm going to attempt to replace wrap each instance in a <span></span> - but having absolutely no experience in RegEx, I'm not sure I'm able to do it.

I've figured character sets [\.\,\'\"\?\!] but I'm not sure how to match character sets that only occur between certain characters. Can anybody help?

解决方案

To start off, here's a X-browser dom-parser function:

var parseXML = (function(w,undefined)
{
    'use strict';
    var parser,ie = false;
    switch (true)
    {
        case w.DOMParser !== undefined:
            parser = new w.DOMParser();
        break;
        case new w.ActiveXObject("Microsoft.XMLDOM") !== undefined:
            parser = new w.ActiveXObject("Microsoft.XMLDOM");
            parser.async = false;
            ie = true;
        break;
        default :
            throw new Error('No parser found');
    }
    return function(xmlString)
    {
        if (ie === true)
        {//return DOM
            parser.loadXML(xmlString);
            return parser;
        }
        return parser.parseFromString(xmlString,'text/xml');
    };
})(this);
//usage:    
var newDom = parseXML(yourString);
var allTags = newDom.getElementsByTagName('*');
for(var i=0;i<allTags.length;i++)
{
    if (allTags[i].tagName.toLowerCase() === 'span')
    {//if all you want to work with are the spans:
        if (allTags[i].hasChildNodes())
        {
            //this span has nodes inside, don't apply regex:
            continue;            
        }         
        allTags[i].innerHTML = allTags[i].innerHTML.replace(/[.,?!'"]+/g,'');
    }
}

This should help you on your way. You still have access to the DOM, so whenever you find a string that needs filtering/replacing, you can reference the node using allTags[i] and replace the contents.
Note that looping through all elements isn't to be recommended, but I didn't really feel like doing all of the work for you ;-). You'll have to check what kind of node you're handling:

if (allTags[i].tagName.toLowerCase() === 'span')
{//do certain things
}
if (allTags[i].tagName.toLowerCase() === 'html')
{//skip
    continue;
}    

And that sort of stuff...
Note that this code is not tested, but it's a simplified version of my answer to a previous question. The parser-bit should work just fine, in fact here's a fiddle I've set up for that other question, that also shows you how you might want to alter this code to better suite your needs

这篇关于JavaScript RegEx匹配标点符号不是任何HTML标签的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆