如何从javascript源代码标记化/解析字符串文字 [英] how to tokenize/parse string literals from javascript source code

查看:76
本文介绍了如何从javascript源代码标记化/解析字符串文字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用C#开发一个程序,该程序需要加载一些javascript代码,对其进行解析并对代码中找到的字符串文字进行一些处理(例如用其他内容覆盖它们).

I am working on a program in C# that needs to load some javascript code, parse it and do some processing to the string literals found in the code (such as overwrite them with something else).

我的问题是,我很难设计出一种优雅的方法来首先在javascript代码中实际找到字符串文字.

My problem is that I'm having a difficult time devising an elegant way to actually find the string literals in the javascript code in the first place.

例如,看看下面的示例javascript代码.您是否看到Stack Overflow的代码highliter能够在代码中挑选出字符串文字并将其变为红色?

For example take a look at the sample javascript code below. Do you see how even Stack Overflow's code highliter is able to pick out string literals in the code, and make them red in color?

我基本上想做同样的事情,只是我不会将它们变成不同的颜色,但是我将对它们进行一些处理,并可能将其替换为完全不同的字符串文字.

I want to basically do the same thing, except I will not be turning them into a different color, but I will do some processing on them and possibly replace it with an entirely different string literal.

var dp = {
    sh :                    // dp.sh
    {
        Utils   : {},       // dp.sh.Utils
        Brushes : {},       // dp.sh.Brushes
        Strings : {},
        Version : '1.3.0'
    }
};

dp.sh.Strings = {
    AboutDialog : '<html><head><title>About...</title></head><body class="dp-about"><table cellspacing="0"><tr><td class="copy"><p class="title">dp.SyntaxHighlighter</div><div class="para">Version: {V}</p><p><a href="http://www.dreamprojections.com/syntaxhighlighter/?ref=about" target="_blank">http://www.dreamprojections.com/SyntaxHighlighter</a></p>&copy;2004-2005 Alex Gorbatchev. All right reserved.</td></tr><tr><td class="footer"><input type="button" class="close" value="OK" onClick="window.close()"/></td></tr></table></body></html>',

    // tools
    ExpandCode : '+ expand code',
    ViewPlain : 'view plain',
    Print : 'print',
    CopyToClipboard : 'copy to clipboard',
    About : '?',

    CopiedToClipboard : 'The code is in your clipboard now.'
};

dp.test1 = 'some test blah blah blah' +  someFunction()  + 'asdfasdfsdf';
dp.test2 = 'some test blah blah blah' +  'xxxxx'  + 'asdfasdfsdf';
dp.test3 = 'some test blah blah blah' +  "XXXXsdf "" \" \' ' sdfdff "" \" \' ' asdfASDaSD FASDF SDF'  + 'asdfasdfsdf";

dp.SyntaxHighlighter = dp.sh;

我已经尝试通过查找引号进行解析,但是当字符串文字中包含转义字符时,它会变得很复杂.我当时想到的另一种解决方案是使用RegEx,但是我对正则表达式的理解不够强,我甚至不确定这是否是我应该研究的途径.

I have tried parsing through looking for quotes, but it gets complicated when you have escape characters in the string literal. The other solution I was thinking is to use a RegEx, but I am not strong enough with Regular Expressions and i'm not even sure if that is the avenue I should be perusing.

我想看看Stack Oveflow的想法.谢谢一堆!

I would like to see what Stack Oveflow thinks. Thanks a bunch!

推荐答案

深度正则表达式:高级带引号的字符串匹配 提供了一些如何使用正则表达式的良好示例.

Regexs in Depth: Advanced Quoted String Matching has some good examples of how to do this with a regex.

方法之一是:


(["'])(?:(?!\1)[^\\]|\\.)*\1

您可以按以下方式使用它:

You could use it as follows:

string modifiedJavascriptText =
   Regex.Replace
   (
      javascriptText, 
      @"([""'])(?:(?!\1)[^\\]|\\.)*\1", // Note the escaped quote
      new MatchEvaluator
      (
         delegate(Match m) 
         { 
            return m.Value.ToUpper(); 
         }
      )
   );

在这种情况下,所有字符串文字都大写.

in this case, all of the string literals are made upper case.

这篇关于如何从javascript源代码标记化/解析字符串文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆