否定正则表达式中的反向引用 [英] Negating a backreference in Regular Expressions

查看:72
本文介绍了否定正则表达式中的反向引用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果字符串具有这种预测格式:

value = "你好,早上好"

其中 "(引号)也可能是 '(单引号),并且结束字符(' 或 ")将与开始字符相同.我想匹配引号之间的字符串.

\bvalue\s*=\s*(["'])([^\1]*)\1

(两个 \s 允许在 = 符号附近有任何空格)

第一个捕获组"(在第一对括号内)-应该匹配开头的引号,应该是 ' 或 "然后 - 我应该允许任何数量的字符不是在第一组中捕获的字符,然后我希望在组中捕获的字符(括起来的引号).

(所需的字符串应在第二个捕获组中捕获).
但这不起作用.

这样做:

\bvalue\s*=\s*(['"])([^"']*)["']

但我想确保左引号和右引号(双引号或单引号)相同.


编辑
目标基本上是获取在其类属性中包含某个类名的锚点的开始标记,我想涵盖包含 (') 或 (") 的类属性的罕见情况.

按照这里的所有建议,我使用了模式:

<\s*\ba\b[^<>]+\bclass\s*=\s*("|'|\\"|\\')(?:(?!\1).)*\s*classname\s*(?:(?!\1).)*\1[^>]*>

含义:
找到一个标签打开标志.
允许任何空格.
找到 a.
允许任何非结束标签.
找到类(任何空格)=(任何空格)"
获取开头引号,以下之一:(" 或 ' 或 \" 或 \').
来自 Alan Moore 的回答:允许任何不是开头引号的字符.
查找类名
允许任何不是开始引号的字符.
找到与开场白相同的结束语.
允许任何非结束标记字符.
找到结束标记字符.

解决方案

代替否定字符类,您必须使用否定前瞻:

\bvalue\s*=\s*(["'])(?:(?!\1).)*\1

(?:(?!\1).)* 一次消耗一个字符,之后 前瞻确认该字符不是匹配的字符捕获组,(["'']).一个字符类,无论是否取反,一次只能匹配一个字符.据正则引擎所知,\1 可以表示任意数量的字符,并且无法说服 \1 在这种情况下只包含 "'.因此,您必须采用更通用(且可读性更低)的解决方案.

if a string has this predicted format:

value = "hello and good morning"

Where the " (quotations) might also be ' (single quote), and the closing char (' or ") will be the same as the opening one. I want to match the string between the quotation marks.

\bvalue\s*=\s*(["'])([^\1]*)\1

(the two \s are to allow any spaces near the = sign)

The first "captured group" (inside the first pair of brackets) - should match the opening quotation which should be either ' or " then - I'm supposed to allow any number of characters that are not what was captured in the first group, and then I expect the character captured in the group (the enclosing quotation marks).

(the required string should be captured in the second capture-group).
This doesn't work though.

This does:

\bvalue\s*=\s*(['"])([^"']*)["']

but I want to make sure that both the opening and closing quotation mark (either double or single) are the same.


EDIT
The goal was basically to get the opening tag of an anchor that has a certain class-name included within its class attribute, and I wanted to cover the rare occasion of the class attribute including a (') or a (").

Following all of the advices here, I used the pattern:

<\s*\ba\b[^<>]+\bclass\s*=\s*("|'|\\"|\\')(?:(?!\1).)*\s*classname\s*(?:(?!\1).)*\1[^>]*>

Meaning:
Find a tag-open sign.
Allow any spaces.
Find the word a.
Allow any non-closing-tag.
Find "class (any spaces) = (any spaces)"
Get opening quotes, one of the following: (" or ' or \" or \').
From Alan Moore's answer: Allow any characters that are not the opening quotes.
find classname
Allow any characters that are not the opening quotes.
Find the closing quote which is the same as the opening.
Allow any unclosing-tag chars.
Find the closing tag char.

解决方案

Instead of a negated character class, you have to use a negative lookahead:

\bvalue\s*=\s*(["'])(?:(?!\1).)*\1

(?:(?!\1).)* consumes one character at a time, after the lookahead has confirmed that the character is not whatever was matched by the capturing group, (["'']). A character class, negated or not, can only match one character at a time. As far as the regex engine knows, \1 could represent any number of characters, and there's no way to convince it that \1 will only contain " or ' in this case. So you have to go with the more general (and less readable) solution.

这篇关于否定正则表达式中的反向引用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆