如何访问 JavaScript 正则表达式中的匹配组? [英] How do you access the matched groups in a JavaScript regular expression?

查看:31
本文介绍了如何访问 JavaScript 正则表达式中的匹配组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 正则表达式 匹配字符串的一部分,然后访问括号子串:

I want to match a portion of a string using a regular expression and then access that parenthesized substring:

    var myString = "something format_abc"; // I want "abc"

    var arr = /(?:^|s)format_(.*?)(?:s|$)/.exec(myString);

    console.log(arr);     // Prints: [" format_abc", "abc"] .. so far so good.
    console.log(arr[1]);  // Prints: undefined  (???)
    console.log(arr[0]);  // Prints: format_undefined (!!!)

我做错了什么?

我发现上面的正则表达式代码没有任何问题:我测试的实际字符串是这样的:

I've discovered that there was nothing wrong with the regular expression code above: the actual string which I was testing against was this:

"date format_%A"

报告%A"未定义似乎是一种很奇怪的行为,但它与这个问题没有直接关系,所以我开了一个新的,为什么匹配的子字符串返回未定义"?在 JavaScript 中?.

Reporting that "%A" is undefined seems a very strange behaviour, but it is not directly related to this question, so I've opened a new one, Why is a matched substring returning "undefined" in JavaScript?.

问题在于 console.logprintf 语句一样接受它的参数,并且由于我记录的字符串 ("%A""%A"code>) 有一个特殊值,它试图找到下一个参数的值.

The issue was that console.log takes its parameters like a printf statement, and since the string I was logging ("%A") had a special value, it was trying to find the value of the next parameter.

推荐答案

您可以像这样访问捕获组:

You can access capturing groups like this:

var myString = "something format_abc";
var myRegexp = /(?:^|s)format_(.*?)(?:s|$)/g;
var myRegexp = new RegExp("(?:^|s)format_(.*?)(?:s|$)", "g");
var match = myRegexp.exec(myString);
console.log(match[1]); // abc

如果有多个匹配项,您可以遍历它们:

And if there are multiple matches you can iterate over them:

var myString = "something format_abc";
var myRegexp = new RegExp("(?:^|s)format_(.*?)(?:s|$)", "g");
match = myRegexp.exec(myString);
while (match != null) {
  // matched text: match[0]
  // match start: match.index
  // capturing group n: match[n]
  console.log(match[0])
  match = myRegexp.exec(myString);
}

如您所见,迭代多个匹配项的方式不是很直观.这导致了 String.prototype.matchAll 的提议方法.这种新方法预计将在 ECMAScript 2020 规范中发布.它为我们提供了一个干净的 API 并解决了多个问题.它已经开始登陆主流浏览器和 JS 引擎,如 Chrome 73+/Node 12+ 和火狐 67+.

As you can see the way to iterate over multiple matches was not very intuitive. This lead to the proposal of the String.prototype.matchAll method. This new method is expected to ship in the ECMAScript 2020 specification. It gives us a clean API and solves multiple problems. It has been started to land on major browsers and JS engines as Chrome 73+ / Node 12+ and Firefox 67+.

该方法返回一个迭代器,用法如下:

The method returns an iterator and is used as follows:

const string = "something format_abc";
const regexp = /(?:^|s)format_(.*?)(?:s|$)/g;
const matches = string.matchAll(regexp);
    
for (const match of matches) {
  console.log(match);
  console.log(match.index)
}

当它返回一个迭代器时,我们可以说它是惰性的,这在处理特别大量的捕获组或非常大的字符串时很有用.但是,如果您需要,可以使用 spread 语法Array.from 方法轻松地将结果转换为数组:

As it returns an iterator, we can say it's lazy, this is useful when handling particularly large numbers of capturing groups, or very large strings. But if you need, the result can be easily transformed into an Array by using the spread syntax or the Array.from method:

function getFirstGroup(regexp, str) {
  const array = [...str.matchAll(regexp)];
  return array.map(m => m[1]);
}

// or:
function getFirstGroup(regexp, str) {
  return Array.from(str.matchAll(regexp), m => m[1]);
}

同时,虽然这个提议得到了更广泛的支持,但你可以使用官方垫片包.

In the meantime, while this proposal gets more wide support, you can use the official shim package.

此外,该方法的内部运作也很简单.使用生成器函数的等效实现如下:

Also, the internal workings of the method are simple. An equivalent implementation using a generator function would be as follows:

function* matchAll(str, regexp) {
  const flags = regexp.global ? regexp.flags : regexp.flags + "g";
  const re = new RegExp(regexp, flags);
  let match;
  while (match = re.exec(str)) {
    yield match;
  }
}

创建了原始正则表达式的副本;这是为了避免在进行多次匹配时由于 lastIndex 属性发生变化而产生的副作用.

A copy of the original regexp is created; this is to avoid side-effects due to the mutation of the lastIndex property when going through the multple matches.

此外,我们需要确保正则表达式具有 global 标志以避免无限循环.

Also, we need to ensure the regexp has the global flag to avoid an infinite loop.

我也很高兴看到在 中引用了这个 StackOverflow 问题提案的讨论.

这篇关于如何访问 JavaScript 正则表达式中的匹配组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆