使用正则表达式突出显示 html 中的单词 &javascript - 差不多了 [英] highlight words in html using regex & javascript - almost there

查看:25
本文介绍了使用正则表达式突出显示 html 中的单词 &javascript - 差不多了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个 jquery 插件,它将执行浏览器样式的页面查找搜索.我需要改进搜索,但还不想解析 html.

目前我的方法是获取整个 DOM 元素和所有嵌套元素,然后简单地为给定的术语运行正则表达式查找/替换.在替换中,我将简单地在匹配的术语周围环绕一个跨度,并使用该跨度作为我的锚点来进行突出显示、滚动等操作.任何 html 标签中的任何字符都不能匹配,这一点至关重要.>

这是我得到的最接近的:

(?<=^|>)([^><].*?)(?=<|$)

它在捕获 html 标记中不是的所有字符方面做得非常好,但我无法弄清楚如何插入我的搜索词.

Input: 任何 html 元素(这可能非常大,例如 <body>)搜索词:1 个或更多字符替换文本:<span class='highlight'>$1</span>

更新

当我使用 http://gskinner.com/RegExr/进行测试时,以下正则表达式执行我想要的操作...

正则表达式:(?<=^|>)(.*?)(SEARCH_STRING)(?=.*?<|$)替换:$1<span class='highlight'>$2</span>

但是我在我的 javascript 中使用它时遇到了一些问题.使用以下代码,chrome 给了我错误无效的正则表达式:/(?<=^|>)(.?)(Mary)(?=.?<|$)/: 无效组".

var origText = $('#'+opt.targetElements).data('origText');var regx = new RegExp("(?<=^|>)(.*?)(" + $this.val() + ")(?=.*?<|$)", 'gi');$('#'+opt.targetElements).each(function() {var text = origText.replace(regx, '$1$2</span>');$(this).html(text);});

它正在破坏组 (?<=^|>) - 这是笨拙的东西还是正则表达式引擎的不同之处?

更新

这个正则表达式在该组中被破坏的原因是因为 Javascript 不支持正则表达式后视.供参考可能的解决方案:http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript.

解决方案

只需使用内置的 jQuery text() 方法.它将返回选定 DOM 元素中的所有字符.

对于 DOM 方法(Node 接口的文档):遍历元素的所有子节点.如果子节点是元素节点,则递归运行.如果是文本节点,请搜索文本 (node.data),如果要突出显示/更改某些内容,请缩短节点的文本直到找到位置,然后插入一个 highligth-span匹配的文本和文本其余部分的另一个文本节点.

示例代码(已调整,来源为此处):

(function iterate_node(node) {if (node.nodeType === 3) {//Node.TEXT_NODEvar text = node.data,pos = text.search(/any正则表达式/g),//indexOf也适用长度 = 5;//或者你找到的任何东西如果 (pos > -1) {node.data = text.substr(0, pos);//拆分成一部分之前...var rest = document.createTextNode(text.substr(pos+length));//后面的部分var highlight = document.createElement("span");//和之间的一部分highlight.className = "highlight";highlight.appendChild(document.createTextNode(text.substr(pos, length)));node.parentNode.insertBefore(休息,node.nextSibling);//插入后node.parentNode.insertBefore(highlight, node.nextSibling);迭代节点(休息);//可能有更多匹配项}} else if (node.nodeType === 1) {//Node.ELEMENT_NODEfor (var i = 0; i < node.childNodes.length; i++) {iterate_node(node.childNodes[i]);//在 DOM 上运行递归}}})(内容);//任何 dom 节点

还有 highlight.js,可能正是你想要什么.

I am writing a jquery plugin that will do a browser-style find-on-page search. I need to improve the search, but don't want to get into parsing the html quite yet.

At the moment my approach is to take an entire DOM element and all nested elements and simply run a regex find/replace for a given term. In the replace I will simply wrap a span around the matched term and use that span as my anchor to do highlighting, scrolling, etc. It is vital that no characters inside any html tags are matched.

This is as close as I have gotten:

(?<=^|>)([^><].*?)(?=<|$)

It does a very good job of capturing all characters that are not in an html tag, but I'm having trouble figuring out how to insert my search term.

Input: Any html element (this could be quite large, eg <body>)    
Search Term: 1 or more characters    
Replace Txt: <span class='highlight'>$1</span>

UPDATE

The following regex does what I want when I'm testing with http://gskinner.com/RegExr/...

Regex: (?<=^|>)(.*?)(SEARCH_STRING)(?=.*?<|$)
Replacement: $1<span class='highlight'>$2</span>

However I am having some trouble using it in my javascript. With the following code chrome is giving me the error "Invalid regular expression: /(?<=^|>)(.?)(Mary)(?=.?<|$)/: Invalid group".

var origText = $('#'+opt.targetElements).data('origText');
var regx = new RegExp("(?<=^|>)(.*?)(" + $this.val() + ")(?=.*?<|$)", 'gi');
$('#'+opt.targetElements).each(function() {
   var text = origText.replace(regx, '$1<span class="' + opt.resultClass + '">$2</span>');
   $(this).html(text);
});

It's breaking on the group (?<=^|>) - is this something clumsy or a difference in the Regex engines?

UPDATE

The reason this regex is breaking on that group is because Javascript does not support regex lookbehinds. For reference & possible solutions: http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript.

解决方案

Just use jQuerys built-in text() method. It will return all the characters in a selected DOM element.

For the DOM approach (docs for the Node interface): Run over all child nodes of an element. If the child is an element node, run recursively. If it's a text node, search in the text (node.data) and if you want to highlight/change something, shorten the text of the node until the found position, and insert a highligth-span with the matched text and another text node for the rest of the text.

Example code (adjusted, origin is here):

(function iterate_node(node) {
    if (node.nodeType === 3) { // Node.TEXT_NODE
        var text = node.data,
            pos = text.search(/any regular expression/g), //indexOf also applicable
            length = 5; // or whatever you found
        if (pos > -1) {
            node.data = text.substr(0, pos); // split into a part before...
            var rest = document.createTextNode(text.substr(pos+length)); // a part after
            var highlight = document.createElement("span"); // and a part between
            highlight.className = "highlight";
            highlight.appendChild(document.createTextNode(text.substr(pos, length)));
            node.parentNode.insertBefore(rest, node.nextSibling); // insert after
            node.parentNode.insertBefore(highlight, node.nextSibling);
            iterate_node(rest); // maybe there are more matches
        }
    } else if (node.nodeType === 1) { // Node.ELEMENT_NODE
        for (var i = 0; i < node.childNodes.length; i++) {
            iterate_node(node.childNodes[i]); // run recursive on DOM
        }
    }
})(content); // any dom node

There's also highlight.js, which might be exactly what you want.

这篇关于使用正则表达式突出显示 html 中的单词 &amp;javascript - 差不多了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆