在一串javascript代码中查找正则表达式文字 [英] finding regular expression literals in a string of javascript code

查看:73
本文介绍了在一串javascript代码中查找正则表达式文字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用javascript粗略解析javascript代码.我将保留为什么的详细信息,但我只想说我不想想要集成大量的库代码,因为就我的目的而言,这是不必要的,并且保持这种轻巧和相对简单的方式非常重要.因此,请不要建议我使用JsLint或类似的东西.如果答案是超出您可以粘贴到答案中的代码,则可能比我想要的更多.

I am doing a sort of crude parsing of javascript code, with javascript. I'll spare the details of why I need to do this, but suffice to say that I don't want to integrate a huge chunk of library code, as it is unnecessary for my purposes and it is important that I keep this very lightweight and relatively simple. So please don't suggest I use JsLint or anything like that. If the answer is more code than you can paste into your answer, it's probably more than I want.

我的代码目前能够很好地检测引用的节和注释,然后匹配花括号,方括号和括号(请确保不要被引号和注释所混淆,或者不要被引号引起来).这就是我需要做的所有事情,而且做得很好...除了一个例外:

My code currently is able to do a good job of detecting quoted sections and comments, and then matching braces, brackets and parens (making sure not to be confused by the quotes and comments, or escapes within quotes, of course). This is all I need it to do, and it does it well...with one exception:

它可以与正则表达式文字混淆.因此,我希望在检测javascript字符串中的正则表达式文字方面有所帮助,以便我能够适当地处理它们.

It can be confused by regular expression literals. So I'm hoping for some help with detecting regular expression literals in a string of javascript, so I can handle them appropriately.

类似这样的东西:

function getRegExpLiterals (stringOfJavascriptCode) {
  var output = [];
  // todo!
  return output;
}

var jsString =  "var regexp1 = /abcd/g, regexp1 = /efg/;"
console.log (getRegExpLiterals (jsString));

// should print:
// [{startIndex: 13, length: 7}, {startIndex: 32, length: 5}]

推荐答案

es5-lexer 是JS词法分析器,它使用非常精确的试探法将JS代码中的正则表达式与除法表达式区分开,并且还提供了令牌级别的转换,您可以使用该转换来确保完整的JS解析器以相同的方式解释生成的程序.通过词法分析器.

es5-lexer is a JS lexer that uses a very accurate heuristic to distinguish regular expressions in JS code from division expressions, and also provides a token level transformation that you can use to make sure that the resulting program will be interpreted the same way by a full JS parser as by the lexer.

确定/是否开始正则表达式的位在

The bit that determines whether a / starts a regular expression is in guess_is_regexp.js and the tests start at scanner_test.js line 401

var REGEXP_PRECEDER_TOKEN_RE = new RegExp(
  "^(?:"  // Match the whole tokens below
    + "break"
    + "|case"
    + "|continue"
    + "|delete"
    + "|do"
    + "|else"
    + "|finally"
    + "|in"
    + "|instanceof"
    + "|return"
    + "|throw"
    + "|try"
    + "|typeof"
    + "|void"
    // Binary operators which cannot be followed by a division operator.
    + "|[+]"  // Match + but not ++.  += is handled below.
    + "|-"    // Match - but not --.  -= is handled below.
    + "|[.]"    // Match . but not a number with a trailing decimal.
    + "|[/]"  // Match /, but not a regexp.  /= is handled below.
    + "|,"    // Second binary operand cannot start a division.
    + "|[*]"  // Ditto binary operand.
  + ")$"
  // Or match a token that ends with one of the characters below to match
  // a variety of punctuation tokens.
  // Some of the single char tokens could go above, but putting them below
  // allows closure-compiler's regex optimizer to do a better job.
  // The right column explains why the terminal character to the left can only
  // precede a regexp.
  + "|["
    + "!"  // !           prefix operator operand cannot start with a division
    + "%"  // %           second binary operand cannot start with a division
    + "&"  // &, &&       ditto binary operand
    + "("  // (           expression cannot start with a division
    + ":"  // :           property value, labelled statement, and operand of ?:
           //             cannot start with a division
    + ";"  // ;           statement & for condition cannot start with division
    + "<"  // <, <<, <<   ditto binary operand
    // !=, !==, %=, &&=, &=, *=, +=, -=, /=, <<=, <=, =, ==, ===, >=, >>=, >>>=,
    // ^=, |=, ||=
    // All are binary operands (assignment ops or comparisons) whose right
    // operand cannot start with a division operator
    + "="
    + ">"  // >, >>, >>>  ditto binary operand
    + "?"  // ?           expression in ?: cannot start with a division operator
    + "["  // [           first array value & key expression cannot start with
           //             a division
    + "^"  // ^           ditto binary operand
    + "{"  // {           statement in block and object property key cannot start
           //             with a division
    + "|"  // |, ||       ditto binary operand
    + "}"  // }           PROBLEMATIC: could be an object literal divided or
           //             a block.  More likely to be start of a statement after
           //             a block which cannot start with a /.
    + "~"  // ~           ditto binary operand
  + "]$"
  // The exclusion of ++ and -- from the above is also problematic.
  // Both are prefix and postfix operators.
  // Given that there is rarely a good reason to increment a regular expression
  // and good reason to have a post-increment operator as the left operand of
  // a division (x++ / y) this pattern treats ++ and -- as division preceders.
  );

这篇关于在一串javascript代码中查找正则表达式文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆