匹配所有但不引用字符串 [英] Match everything but not quoted strings

查看:126
本文介绍了匹配所有但不引用字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望匹配除了没有引用字符串之外的所有内容。

I want to match everything but no quoted strings.

我可以将所有引用的字符串与此匹配: /(((([^ \\] | \\。)*)|('(([^'\\] | \\。)*'))/
所以我尝试匹配所有内容但没有带引号的字符串: / [^((([^\\] | \\。)*)|('([ ^'\\] | \\。)*'))] / 但它不起作用。

I can match all quoted strings with this: /(("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))/ So I tried to match everything but no quoted strings with this: /[^(("([^"\\]|\\.)*")|('([^'\\]|\\.)*'))]/ but it doesn't work.

I我想只使用正则表达式,因为我想要替换它并希望在它返回后得到引用的文本。

I would like to use only regex because I will want to replace it and want to get the quoted text after it back.

string.replace(regex, function(a, b, c) {
   // return after a lot of operations
});

引用的字符串对我来说就是这样的bad string或者这个'酷字符串'

A quoted string is for me something like this "bad string" or this 'cool string'

所以如果我输入:

he\'re is "watever o\"k" efre 'dder\'4rdr'?

它应该输出这些匹配:

["he\'re is ", " efre ", "?"]

而且我不想替换它们。

我知道我的问题非常困难但并非不可能!没有什么是不可能的。

I know my question is very difficult but it is not impossible! Nothing is impossible.

谢谢

推荐答案

编辑:重写以覆盖更多边缘情况。



这可以完成,但它是一个有点复杂。

Rewritten to cover more edge cases.

This can be done, but it's a bit complicated.

result = subject.match(/(?:(?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*$)(?:\\.|[^\\'"]))+/g);

将返回

, he said. 
, she replied. 
, he reminded her. 
, 

来自此字符串(为了清楚起见,添加了换行符并删除了引号):

from this string (line breaks added and enclosing quotes removed for clarity):

"Hello", he said. "What's up, \"doc\"?", she replied. 
'I need a 12" crash cymbal', he reminded her. 
"2\" by 4 inches", 'Back\"\'slashes \\ are OK!'

说明:(有点令人难以置信)

Explanation: (sort of, it's a bit mindboggling)

打破正则表达式:

(?:
 (?=      # Assert even number of (relevant) single quotes, looking ahead:
  (?:
   (?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*
   '
   (?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*
   '
  )*
  (?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*
  $
 )
 (?=      # Assert even number of (relevant) double quotes, looking ahead:
  (?:
   (?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*
   "
   (?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*
   "
  )*
  (?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*
  $
 )
 (?:\\.|[^\\'"]) # Match text between quoted sections
)+

首先,您可以看到有两个相似的部分。这两个前瞻断言都确保前面的字符串中有偶数个单/双引号,而忽略了相反类型的转义引号和引号。我将用单引号部分显示它:

First, you can see that there are two similar parts. Both these lookahead assertions ensure that there is an even number of single/double quotes in the string ahead, disregarding escaped quotes and quotes of the opposite kind. I'll show it with the single quotes part:

(?=                   # Assert that the following can be matched:
 (?:                  # Match this group:
  (?:                 #  Match either:
   \\.                #  an escaped character
  |                   #  or
   "(?:\\.|[^"\\])*"  #  a double-quoted string
  |                   #  or
   [^\\'"]            #  any character except backslashes or quotes
  )*                  # any number of times.
  '                   # Then match a single quote
  (?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*'   # Repeat once to ensure even number,
                      # (but don't allow single quotes within nested double-quoted strings)
 )*                   # Repeat any number of times including zero
 (?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*      # Then match the same until...
 $                    # ... end of string.
)                     # End of lookahead assertion.

双引号部分的工作原理相同。

The double quotes part works the same.

然后,在字符串中这两个断言成功的每个位置,正则表达式的下一部分实际上试图匹配某些东西:

Then, at each position in the string where these two assertions succeed, the next part of the regex actually tries to match something:

(?:      # Match either
 \\.     # an escaped character
|        # or
 [^\\'"] # any character except backslash, single or double quote
)        # End of non-capturing group

整个事情重复一次或多次,尽可能多次。 / g 修饰符确保我们得到字符串中的所有匹配。

The whole thing is repeated once or more, as many times as possible. The /g modifier makes sure we get all matches in the string.

在RegExr上查看此操作

这篇关于匹配所有但不引用字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆