用正则表达式解析VB代码 [英] parsing VB code with a regex

查看:164
本文介绍了用正则表达式解析VB代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须创建一个例程,用小的,任意的VB代码找到令牌

片段。例如,它可能必须找到所有出现的

{Formula}


我以为使用正则表达式可能是解决问题的一种巧妙方法/>
这个,但我是他们的新手。任何人都可以在这里给我一个提示吗?


问题是,它必须只找到未引用的令牌而不是

评论;示例如下


(1)应该在此字符串中找到{Area}(两次出现)和{Height}

如果{Area}> 100然后返回{Area} Else Return {Height}


(2)应该在此字符串中找到{Area},但不能找到{AreaString}

If {区域} =" {AreaString}" 100然后返回找到它!


(3)应该在这个多行字符串中找到{Height},但不是{Area}

''这里没有使用{Area}令牌

如果{Height}> 1000然后

返回高

否则

返回短

结束如果


我搜索了很多网站和图书馆,但他们似乎都对找到引用的字符串感兴趣,而不是避免它们。我会感激任何

帮助。


Emby

I must create a routine that finds tokens in small, arbitrary VB code
snippets. For example, it might have to find all occurrences of
{Formula}

I was thinking that using regular expressions might be a neat way to solve
this, but I am new to them. Can anyone give me a hint here?

The catch is, it must only find tokens that are not quoted and not
commented; examples follow

(1) should find {Area} (both occurrences) and {Height} in this string
If {Area} > 100 Then Return {Area} Else Return {Height}

(2) should find {Area}, but not {AreaString} in this string
If {Area} = "{AreaString}" 100 Then Return "Found it!"

(3) should find {Height}, but not {Area} in this multi-line string
''the {Area} token is not used here
If {Height} > 1000 Then
Return "Tall"
Else
Return "Short"
End If

I''ve searched many web sites and libraries, but they all seem to be
interested in finding quoted strings, not avoiding them. I''d appreciate any
help.

Emby

推荐答案

Mark写道:
Mark wrote:
我必须创建一个例程,在小的,任意的VB代码片段中找到令牌。例如,它可能必须找到所有出现的
{Formula}
问题是,它必须只找到未引用且未经过评论的标记;示例遵循


低,你希望匹配所有{令牌}除了那些1)之间的双

报价和2)单引号和行结束之间。对吗?

我搜索了很多网站和图书馆,但他们似乎都对找到引用字符串感兴趣,而不是避免它们。
I must create a routine that finds tokens in small, arbitrary VB code
snippets. For example, it might have to find all occurrences of
{Formula} The catch is, it must only find tokens that are not quoted and not
commented; examples follow
Iow, you want to match all {tokens} except those 1) between double
quotes and 2) between a single quote and a line end. Right?
I''ve searched many web sites and libraries, but they all seem to be
interested in finding quoted strings, not avoiding them.




是的,我并不感到惊讶。这不是微不足道的。这两个正则表达式使用

相同的3个选项 - IgnoreCase |多行|

IgnorePatternWhitespace - 绝对不是单行。


第一遍,忽略评论:


(?< ;'''。*)#不能直接''char

\ {(?< token> [a-z_0-9] \w +)\} #捕获{token}


因为没有设置单行,所以(?<!"负面反馈

断言排除了右边的任何内容一个''字符。


第二遍,几乎就在那里:


(?<!''。*)#Can''在''char

的右边(?<!^。*")#不能在'char

$ b $的右边b \ {(?< token> [a-z_0-9] \w +)\}#捕获{token}


(?!"。*



Yeah, I''m not surprised. This is not trivial. Both these regexes use
the same 3 options - IgnoreCase | Multiline |
IgnorePatternWhitespace - and definitely NOT Singleline.

First pass, ignores comments:

(?<! '' .* ) # Can''t be to right of '' char
\{ (?<token> [a-z_0-9]\w+ ) \} # Capture a {token}

Because Singleline is NOT set, the (?<! "negative lookbehind
assertion" rules out anything to the right of a '' character.

Second pass, almost there:

(?<! '' .* ) # Can''t be to right of '' char
(?<! ^ .* " ) # Can''t be to right of " char

\{ (?<token>[a-z_0-9]\w+ ) \} # Capture a {token}

(?! " .*


)#不能离开char


....适用于您的示例和


如果{Area } =" {AreaString}" 100然后返回找到它! {foo}


但是(唉!)它也匹配{foo}


如果{Area} =" {AreaString} " 100然后返回找到它! {foo}


我必须提早上床睡觉,全天参加民意调查

明天。如果你不能从这里拿走它,我会试着在星期三把它带到
更远的地方。 (或者,我会尽量记住 - 免费

给我发送邮件星期二晚上或周三早上。)

-

..NET 2.0 for Delphi Programmers< http://www.midnightbeach.com/.net>


Delphi技能使.NET易于学习
刚打印,现在发货。
) # Can''t be to left of " char

.... works on your examples and on

If {Area} = "{AreaString}" 100 Then Return "Found it!" {foo}

but (alas!) it also matches the {foo} in

If {Area} = "{AreaString}" 100 Then Return Found it!" {foo}

I have to go to bed early to volunteer at the polls all day
tomorrow. If you can''t take it from here, I''ll try to take it
farther on Wednesday. (Or, I''ll try to remember to - feel free
to send me mail Tues night or Wedn morning.)
--

..NET 2.0 for Delphi Programmers <http://www.midnightbeach.com/.net>

Delphi skills make .NET easy to learn
Just printed, and shipping now.


嗨Jon,


感谢您的回复。我对RE很新,但我希望能用RE完成这项任务正在减弱。我拿了你提供的东西(谢谢!),并增加了一点,以便在令牌字符中允许更多的自由(必须从alpha开始,但允许嵌入的空格,点,连字符和下划线,并且还允许在前面的空格结束括号:


(?<!''。*)#不能在''char

(?<! ^。*")#不能在'char

\ {(?< token> [az]([a-z_0-9 \-。]]的右侧)*)[^] \}#捕获{token}

(?!"。*
Hi Jon,

Thanks for the response. I''m very new to RE, but my hope that I can achieve this task with an RE is waning. I took what you provided (thanks!), and enhanced it a bit to allow for a bit more freedom in token chars (must start with alpha, but allow embedded spaces, dots, hyphen and underscore, and also disallow a space just before the closing brace:

(?<! '' .* ) # Can''t be to right of '' char
(?<! ^ .* " ) # Can''t be to right of " char
\{(?<token>[a-z]([a-z_0-9 \-.])*)[^ ]\} # Capture a {token}
(?! " .*


这篇关于用正则表达式解析VB代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆