正则表达式以匹配两个相同字符串之间的所有字符串 [英] Regex to match all the strings between two identical strings

查看:246
本文介绍了正则表达式以匹配两个相同字符串之间的所有字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如我有这串 -- This -- is -- one -- another -- comment -- 我希望匹配的元素是 "This","is","one","another"和"comment"

我正在尝试此正则表达式 --\s+([^--]+)\s+-- 这给了我匹配的元素 这个",一个"和评论"

我已经搜索了其他问题,它们都提供了类似的解决方案,例如#A#,我会得到A,但是对于#A#B#,我也会得到A,但是在这种情况下,我想要两个元素B,因为它们都在两个#字符之间.

我正在针对javascript正则表达式进行测试,但是我认为解决方案应该与平台/语言无关.

解决方案

通常,您需要使用类似

的模式

STRING([\s\S]*?)(?=STRING|$)

它将匹配STRING,然后将任何零个或多个字符(尽可能少)捕获到组1中,直到第一个STRING出现*停止在此单词之前**,因为(?=...)是一个正向前进,它是零宽度的断言,不会消耗匹配的文本或字符串结尾.

模式的通用变体是

STRING((?:(?!STRING)[\s\S])*)

它使用脾气暴躁的令牌(?:(?!STRING)[\s\S])*,该匹配的char等于0个或多个出现,不能启动STRING字符序列.

要获取当前解决方案中的所有子字符串,请使用前瞻性

/--\s+([\s\S]*?)(?=\s+--)/g
                ^^^^^^^^^

请参见 regex演示.

请注意,[^--]+匹配除-之外的1个或多个符号,它不匹配任何不等于--的文本. [...]是与单个字符匹配的字符类.要匹配从一个字符到模式的第一次出现的任何长度的任何文本,您可以依靠[\s\S]*?构造:0个以上的字符,并且尽可能少(由于懒惰的*?量词). /p>

JS演示:

 var s = '-- This -- is -- one -- another -- comment --';
var rx = /--\s+([\s\S]*?)(?=\s+--)/g;
var m, res=[];
while (m = rx.exec(s)) {
  res.push(m[1]);
}
console.log(res); 

E.g. I have this string -- This -- is -- one -- another -- comment -- I want the matched elements to be "This", "is", "one", "another", and "comment"

I was trying this regex --\s+([^--]+)\s+-- which gives me the matched elements as "This", "one" and "comment"

I have searched for other problems, they all provide solution like this i.e. #A# and I will get A but for #A#B# also I get A, but in this case I want both the elements A and B as both of them are between two # chars.

I am testing it for javascript regex, but I think solution should be irrespective of platform/language.

解决方案

In general, you need to use a pattern like

STRING([\s\S]*?)(?=STRING|$)

It will match STRING, then capture into Group 1 any zero or more chars, as few as possible, up to the first occurrence of STRING *stopping right before this word** because the (?=...) is a positive lookahead that, being a zero-width assertion, does not consume matched text or end of string.

A generic variation of the pattern is

STRING((?:(?!STRING)[\s\S])*)

It uses a tempered greedy token, (?:(?!STRING)[\s\S])*, that matches any char, 0 or more occurrences, that does not start a STRING char sequence.

To get all the substrings in the current solution, use a lookahead like

/--\s+([\s\S]*?)(?=\s+--)/g
                ^^^^^^^^^

See the regex demo.

Note that [^--]+ matches 1 or more symbols other than a -, it does not match any text that is not equal to --. [...] is a character class that matches a single character. To match any text of any length from one char up to the first occurrence of a pattern, you can rely on a [\s\S]*? construct: any 0+ chars, as few as possible (due to the lazy *? quantifier).

JS demo:

var s = '-- This -- is -- one -- another -- comment --';
var rx = /--\s+([\s\S]*?)(?=\s+--)/g;
var m, res=[];
while (m = rx.exec(s)) {
  res.push(m[1]);
}
console.log(res);

这篇关于正则表达式以匹配两个相同字符串之间的所有字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆