匹配所有但不引用字符串 [英] Match everything but not quoted strings
问题描述
我希望匹配除了没有引用字符串之外的所有内容。
I want to match everything but no quoted strings.
我可以将所有引用的字符串与此匹配: /(((([^ \\] | \\。)*)|('(([^'\\] | \\。)*'))/
所以我尝试匹配所有内容但没有带引号的字符串: / [^((([^\\] | \\。)*)|('([ ^'\\] | \\。)*'))] /
但它不起作用。
I can match all quoted strings with this: /(("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))/
So I tried to match everything but no quoted strings with this: /[^(("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))]/
but it doesn't work.
I我想只使用正则表达式,因为我想要替换它并希望在它返回后得到引用的文本。
I would like to use only regex because I will want to replace it and want to get the quoted text after it back.
string.replace(regex, function(a, b, c) {
// return after a lot of operations
});
引用的字符串对我来说就是这样的bad string或者这个'酷字符串'
A quoted string is for me something like this "bad string" or this 'cool string'
所以如果我输入:
he\'re is "watever o\"k" efre 'dder\'4rdr'?
它应该输出这些匹配:
["he\'re is ", " efre ", "?"]
而且我不想替换它们。
我知道我的问题非常困难但并非不可能!没有什么是不可能的。
I know my question is very difficult but it is not impossible! Nothing is impossible.
谢谢
推荐答案
编辑:重写以覆盖更多边缘情况。
这可以完成,但它是一个有点复杂。
Rewritten to cover more edge cases.
This can be done, but it's a bit complicated.
result = subject.match(/(?:(?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*$)(?:\\.|[^\\'"]))+/g);
将返回
, he said.
, she replied.
, he reminded her.
,
来自此字符串(为了清楚起见,添加了换行符并删除了引号):
from this string (line breaks added and enclosing quotes removed for clarity):
"Hello", he said. "What's up, \"doc\"?", she replied.
'I need a 12" crash cymbal', he reminded her.
"2\" by 4 inches", 'Back\"\'slashes \\ are OK!'
说明:(有点令人难以置信)
Explanation: (sort of, it's a bit mindboggling)
打破正则表达式:
(?:
(?= # Assert even number of (relevant) single quotes, looking ahead:
(?:
(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*
'
(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*
'
)*
(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*
$
)
(?= # Assert even number of (relevant) double quotes, looking ahead:
(?:
(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*
"
(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*
"
)*
(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*
$
)
(?:\\.|[^\\'"]) # Match text between quoted sections
)+
首先,您可以看到有两个相似的部分。这两个前瞻断言都确保前面的字符串中有偶数个单/双引号,而忽略了相反类型的转义引号和引号。我将用单引号部分显示它:
First, you can see that there are two similar parts. Both these lookahead assertions ensure that there is an even number of single/double quotes in the string ahead, disregarding escaped quotes and quotes of the opposite kind. I'll show it with the single quotes part:
(?= # Assert that the following can be matched:
(?: # Match this group:
(?: # Match either:
\\. # an escaped character
| # or
"(?:\\.|[^"\\])*" # a double-quoted string
| # or
[^\\'"] # any character except backslashes or quotes
)* # any number of times.
' # Then match a single quote
(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*' # Repeat once to ensure even number,
# (but don't allow single quotes within nested double-quoted strings)
)* # Repeat any number of times including zero
(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])* # Then match the same until...
$ # ... end of string.
) # End of lookahead assertion.
双引号部分的工作原理相同。
The double quotes part works the same.
然后,在字符串中这两个断言成功的每个位置,正则表达式的下一部分实际上试图匹配某些东西:
Then, at each position in the string where these two assertions succeed, the next part of the regex actually tries to match something:
(?: # Match either
\\. # an escaped character
| # or
[^\\'"] # any character except backslash, single or double quote
) # End of non-capturing group
整个事情重复一次或多次,尽可能多次。 / g
修饰符确保我们得到字符串中的所有匹配。
The whole thing is repeated once or more, as many times as possible. The /g
modifier makes sure we get all matches in the string.
这篇关于匹配所有但不引用字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!