获取JavaScript正则表达式中每个捕获的索引 [英] Get index of each capture in a JavaScript regex
问题描述
我希望匹配像 /(a)。(b)(c。)d /
这样的正则表达式与aabccde
,并获得以下信息:
I want to match a regex like /(a).(b)(c.)d/
with "aabccde"
, and get the following information back:
"a" at index = 0
"b" at index = 2
"cc" at index = 3
我该怎么做? String.match返回匹配列表和完整匹配开始的索引,而不是每次捕获的索引。
How can I do this? String.match returns list of matches and index of the start of the complete match, not index of every capture.
编辑:一个不能用于普通的测试用例indexOf
A test case which wouldn't work with plain indexOf
regex: /(a).(.)/
string: "aaa"
expected result: "a" at 0, "a" at 2
注意:问题类似于< a href =https://stackoverflow.com/questions/11568345/javascript-regex-how-to-find-index-of-each-subexpression> Javascript Regex:如何查找每个子表达式的索引? ,但我不能修改正则表达式使每个子表达式成为一个捕获组。
Note: The question is similar to Javascript Regex: How to find index of each subexpression?, but I cannot modify the regex to make every subexpression a capturing group.
推荐答案
所以,你有一个文本和一个正则表达式:
So, you have a text and a regular expression:
txt = "aabccde";
re = /(a).(b)(c.)d/;
第一步是获取与正则表达式匹配的所有子串的列表:
The first step is to get the list of all substrings that match the regular expression:
subs = re.exec(txt);
然后,您可以对每个子字符串的文本进行简单搜索。您必须在变量中保留最后一个子字符串的位置。我已将此变量命名为 cursor
。
Then, you can do a simple search on the text for each substring. You will have to keep in a variable the position of the last substring. I've named this variable cursor
.
var cursor = subs.index;
for (var i = 1; i < subs.length; i++){
sub = subs[i];
index = txt.indexOf(sub, cursor);
cursor = index + sub.length;
console.log(sub + ' at index ' + index);
}
编辑:感谢@nhahtdh,我改进了机智并完成了一项功能:
Thanks to @nhahtdh, I've improved the mecanism and made a complete function:
String.prototype.matchIndex = function(re){
var res = [];
var subs = this.match(re);
for (var cursor = subs.index, l = subs.length, i = 1; i < l; i++){
var index = cursor;
if (i+1 !== l && subs[i] !== subs[i+1]) {
nextIndex = this.indexOf(subs[i+1], cursor);
while (true) {
currentIndex = this.indexOf(subs[i], index);
if (currentIndex !== -1 && currentIndex <= nextIndex)
index = currentIndex + 1;
else
break;
}
index--;
} else {
index = this.indexOf(subs[i], cursor);
}
cursor = index + subs[i].length;
res.push([subs[i], index]);
}
return res;
}
console.log("aabccde".matchIndex(/(a).(b)(c.)d/));
// [ [ 'a', 1 ], [ 'b', 2 ], [ 'cc', 3 ] ]
console.log("aaa".matchIndex(/(a).(.)/));
// [ [ 'a', 0 ], [ 'a', 1 ] ] <-- problem here
console.log("bababaaaaa".matchIndex(/(ba)+.(a*)/));
// [ [ 'ba', 4 ], [ 'aaa', 6 ] ]
这篇关于获取JavaScript正则表达式中每个捕获的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!