全面的 RegExp 删除 JavaScript 注释 [英] Comprehensive RegExp to remove JavaScript comments

查看:70
本文介绍了全面的 RegExp 删除 JavaScript 注释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用单个正则表达式可靠地删除所有 JavaScript 注释.

I need to dependably remove all JavaScript comments with a single Regular Expression.

我搜索过 StackOverflow 和其他网站,但都没有考虑到交替引号、多行注释、字符串中的注释、正则表达式等.

I have searched StackOverflow, and other sites, but none take into account alternating quotes, multi-line comments, comments within strings, regular expressions, etc.

是否有任何正则表达式可以从中删除注释:

Is there any Regular expressions that can remove the comments from this:

var test = [
    "// Code",
    '// Code',
    "'// Code",
    '"// Code',
    //" Comment",
    //' Comment',
    /* Comment */
    // Comment /* Comment
    /* Comment
     Comment // */ "Code",
    "Code",
    "/* Code */",
    "/* Code",
    "Code */",
    '/* Code */',
    '/* Code',
    'Code */',
    /* Comment
    "Comment",
    Comment */ "Code",
    /Code\/*/,
    "Code */"
]

这是一个 jsbinjsfiddle 来测试它.

Here's a jsbin or jsfiddle to test it.

推荐答案

我喜欢挑战:)

这是我的工作解决方案:

Here's my working solution:

/((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/)|\/\/.*?$|\/\*[\s\S]*?\*\//gm

将其替换为 $1.

在这里小提琴:http://jsfiddle.net/LucasTrz/DtGq8/6/

当然,正如无数次指出的那样,适当的解析器可能会更好,但仍然......

Of course, as it has been pointed out countless times, a proper parser would probably be better, but still...

注意:我在正则表达式字符串的小提琴中使用了正则表达式文字,过多的转义会破坏你的大脑.

NB: I used a regex literal in the fiddle insted of a regex string, too much escaping can destroy your brain.

((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/) <-- the part to keep
|\/\/.*?$                                                         <-- line comments
|\/\*[\s\S]*?\*\/                                                 <-- inline comments

要保留的部分

(["'])(?:\\[\s\S]|.)*?\2                   <-- strings
\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/     <-- regex literals

字符串

    ["']              match a quote and capture it
    (?:\\[\s\S]|.)*?  match escaped characters or unescpaed characters, don't capture
    \2                match the same type of quote as the one that opened the string

正则表达式文字

    \/                          match a forward slash
    (?![*\/])                   ... not followed by a * or / (that would start a comment)
    (?:\\.|\[(?:\\.|.)\]|.)*?   match any sequence of escaped/unescaped text, or a regex character class
    \/                          ... until the closing slash

要删除的部分

|\/\/.*?$              <-- line comments
|\/\*[\s\S]*?\*\/      <-- inline comments

行注释

    \/\/         match two forward slashes
    .*?$         then everything until the end of the line

内嵌评论

    \/\*         match /*
    [\s\S]*?     then as few as possible of anything, see note below
    \*\/         match */

我不得不使用 [\s\S] 而不是 . 因为不幸的是 JavaScript 不支持正则表达式 s 修饰符(单行- 这个允许 . 匹配换行符)

I had to use [\s\S] instead of . because unfortunately JavaScript doesn't support the regex s modifier (singleline - this one allows . to match newlines as well)

此正则表达式适用于以下极端情况:

This regex will work in the following corner cases:

  • 字符类中包含 / 的正则表达式模式:/[/]/
  • 字符串文字中的转义换行符

而且只是为了乐趣...这是令人眼花缭乱的硬核版本:

And just for the fun of it... here's the eye-bleeding hardcore version:

/((["'])(?:\\[\s\S]|.)*?\2|(?:[^\w\s]|^)\s*\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/(?=[gmiy]{0,4}\s*(?![*\/])(?:\W|$)))|\/\/.*?$|\/\*[\s\S]*?\*\//gm

这会添加以下扭曲的边缘情况(fiddleregex101):

This adds the following twisted edge case (fiddle, regex101):

Code = /* Comment */ /Code regex/g  ; // Comment
Code = Code / Code /* Comment */ /g  ; // Comment    
Code = /Code regex/g /* Comment */  ; // Comment

这是高度启发式的代码,您可能不应该使用它(甚至比之前的正则表达式还要少),而只是让这种极端情况发生.

This is highly heuristical code, you probably shouldn't use it (even less so than the previous regex) and just let that edge case blow.

这篇关于全面的 RegExp 删除 JavaScript 注释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆