使用正则表达式在javascript中匹配引号引起来的字符串 [英] Matching quote wrapped strings in javascript with regex

查看:53
本文介绍了使用正则表达式在javascript中匹配引号引起来的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个正则表达式来匹配

I need a regex for javascript for matching

"{any group of chars}" <-- where that last " is not preceeded by a \

示例:

... foo "bar" ...  => "bar"
... foo"bar\"" ... => "bar\""
... foo "bar" ...  goo"o"ooogle "t\"e\"st"[] => ["bar", "o", "t\"e\"st"]

实际的字符串会更长,并且可能包含多个匹配项,其中还可能包含空格或正则表达式特殊字符.

The actual strings will be longer and may contain multiple matches that could also include white space or regex special chars.

我首先尝试分解语法,但是我自己对regex不够强,我被卡住得很快,但是除了匹配包含\的情况之外,我确实做到了一切都匹配."(我认为)..

I have started by trying to break down the syntax but not being strong with regex myself I got stuck pretty fast but i did get as far as matching everything except for the case where the match contains \" (i think) ...

https://regex101.com/r/sj4HXw/1

更新:

有关我的处境的更多信息...

More about my situation ...

此正则表达式用于在我的博客文章中嵌入的代码块中语法突出显示"字符串,因此真实的示例可能看起来像这样……

This regex is to be used to "syntax highlight" strings in code blocks embedded in my blog posts so a real world example might look something like this ...

<pre id="test" class="code" data-code="csharp">
   if (ConfigurationManager.AppSettings["LogSql"] == "true")
</pre>

我正在使用以下javascript来突出显示..

And I am using the following javascript to achieve the highlight ..

var result = $("#test").text().replace(/"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g, "<span class=\"string\">$1</span>");
$("#test").html(result);

由于某种原因,即使在这种情况下使用建议的答案(至少到目前为止),我也得到了奇怪的结果.

For some reason even when the suggested answers (so far at least) are used in this context i'm getting odd results.

这可行,但是由于某些原因,将值$ 1代替了实际匹配项.

This works but puts the value $1 instead of the actual match for some reason.

推荐答案

简单方案(如OP中一样)

最有效的正则表达式(根据 unroll-the-循环原理),您可以在此处使用

Simple scenario (as in OP)

The most efficient regex (that is written in accordance with the unroll-the-loop principle) you may use here is

"[^"\\]*(?:\\[\s\S][^"\\]*)*"

请参见 regex演示

详细信息:

  • " -匹配第一个"
  • [^" \\] * -除" \
  • 以外的0多个字符
  • (?:\\ [\ s \ S] [^" \\] *)* -出现以下情况:
    • \\ [\ s \ S] -前面带有 \ 的任何字符( [\ s \ S] )
    • [^" \\] * -除" \
    • 以外的0多个字符
    • " - match the first "
    • [^"\\]* - 0+ chars other than " and \
    • (?:\\[\s\S][^"\\]*)* - zer or more occurrences of:
      • \\[\s\S] - any char ([\s\S]) with a \ in front
      • [^"\\]* - 0+ chars other than " and \

      用法:

      // MATCHING
      var rx = /"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g;
      var s = '    ... foo "bar" ...  goo"o"ooogle "t\\"e\\"st"[]';
      var res = s.match(rx);
      console.log(res);
      
      // REPLACING
      console.log(s.replace(rx, '<span>$&</span>'));

      如果在有效匹配之前存在转义的" ,或者在" 之前存在 \ ,则上述方法将不会工作.您将需要匹配那些 \ 并捕获所需的子字符串.

      If there is an escaped " before a valid match or there are \s before a ", the approach above won't work. You will need to match those \s and capture the substring you need.

      /(?:^|[^\\])(?:\\{2})*("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g
       ^^^^^^^^^^^^^^^^^^^^^^                             ^
      

      请参见另一个正则表达式演示.

      用法:

      // MATCHING
      var rx = /(?:^|[^\\])(?:\\{2})*("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g;
      var s = '    ... \\"foo "bar" ...  goo"o"ooogle "t\\"e\\"st"[]';
      var m, res=[];
      while (m = rx.exec(s)) {
        res.push(m[1]);
      }
      console.log(res);
      
      // REPLACING
      console.log(s.replace(/((?:^|[^\\])(?:\\{2})*)("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g, '$1<span>$2</span>'));

      主模式包含捕获括号,并且将其添加在开头:

      The main pattern is wrapped with capturing parentheses, and this is added at the start:

      • (?:^ | [^ \\])-字符串的开头或除 \
      • 以外的任何字符
      • (?:\\ {2})* -0次以上出现双反斜杠.
      • (?:^|[^\\]) - either start of string or any char but \
      • (?:\\{2})* - 0+ occurrences of a double backslash.

      这篇关于使用正则表达式在javascript中匹配引号引起来的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆