获取JavaScript正则表达式中每个捕获的索引 [英] Get index of each capture in a JavaScript regex

查看:162
本文介绍了获取JavaScript正则表达式中每个捕获的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望匹配像 /(a)。(b)(c。)d / 这样的正则表达式与aabccde,并获得以下信息:

I want to match a regex like /(a).(b)(c.)d/ with "aabccde", and get the following information back:

"a" at index = 0
"b" at index = 2
"cc" at index = 3

我该怎么做? String.match返回匹配列表和完整匹配开始的索引,而不是每次捕获的索引。

How can I do this? String.match returns list of matches and index of the start of the complete match, not index of every capture.

编辑:一个不能用于普通的测试用例indexOf

A test case which wouldn't work with plain indexOf

regex: /(a).(.)/
string: "aaa"
expected result: "a" at 0, "a" at 2

注意:问题类似于< a href =https://stackoverflow.com/questions/11568345/javascript-regex-how-to-find-index-of-each-subexpression> Javascript Regex:如何查找每个子表达式的索引? ,但我不能修改正则表达式使每个子表达式成为一个捕获组。

Note: The question is similar to Javascript Regex: How to find index of each subexpression?, but I cannot modify the regex to make every subexpression a capturing group.

推荐答案

所以,你有一个文本和一个正则表达式:

So, you have a text and a regular expression:

txt = "aabccde";
re = /(a).(b)(c.)d/;

第一步是获取与正则表达式匹配的所有子串的列表:

The first step is to get the list of all substrings that match the regular expression:

subs = re.exec(txt);

然后,您可以对每个子字符串的文本进行简单搜索。您必须在变量中保留最后一个子字符串的位置。我已将此变量命名为 cursor

Then, you can do a simple search on the text for each substring. You will have to keep in a variable the position of the last substring. I've named this variable cursor.

var cursor = subs.index;
for (var i = 1; i < subs.length; i++){
    sub = subs[i];
    index = txt.indexOf(sub, cursor);
    cursor = index + sub.length;


    console.log(sub + ' at index ' + index);
}

编辑:感谢@nhahtdh,我改进了机智并完成了一项功能:

Thanks to @nhahtdh, I've improved the mecanism and made a complete function:

String.prototype.matchIndex = function(re){
    var res  = [];
    var subs = this.match(re);

    for (var cursor = subs.index, l = subs.length, i = 1; i < l; i++){
        var index = cursor;

        if (i+1 !== l && subs[i] !== subs[i+1]) {
            nextIndex = this.indexOf(subs[i+1], cursor);
            while (true) {
                currentIndex = this.indexOf(subs[i], index);
                if (currentIndex !== -1 && currentIndex <= nextIndex)
                    index = currentIndex + 1;
                else
                    break;
            }
            index--;
        } else {
            index = this.indexOf(subs[i], cursor);
        }
        cursor = index + subs[i].length;

        res.push([subs[i], index]);
    }
    return res;
}


console.log("aabccde".matchIndex(/(a).(b)(c.)d/));
// [ [ 'a', 1 ], [ 'b', 2 ], [ 'cc', 3 ] ]

console.log("aaa".matchIndex(/(a).(.)/));
// [ [ 'a', 0 ], [ 'a', 1 ] ] <-- problem here

console.log("bababaaaaa".matchIndex(/(ba)+.(a*)/));
// [ [ 'ba', 4 ], [ 'aaa', 6 ] ]

这篇关于获取JavaScript正则表达式中每个捕获的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆