正则表达式对比字符串解析 [英] Regular Expression Vs. String Parsing

查看:28
本文介绍了正则表达式对比字符串解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

冒着打开一罐蠕虫和得到反对票的风险,我发现自己需要问,

At the risk of open a can of worms and getting negative votes I find myself needing to ask,

什么时候应该使用正则表达式,什么时候使用字符串解析更合适?

我需要例子推理来说明你的立场.我希望您解决诸如可读性可维护性可扩展性以及最重要的性能等问题你的答案.

And I'm going to need examples and reasoning as to your stance. I'd like you to address things like readability, maintainability, scaling, and probably most of all performance in your answer.

我发现了另一个问题 这里只有1个答案甚至懒得举个例子.我需要更多才能理解这一点.

I found another question Here that only had 1 answer that even bothered giving an example. I need more to understand this.

我目前正在使用 C++,但正则表达式几乎在所有高级语言中都有,我想知道不同的语言如何使用/处理正则表达式,但这更多是事后的想法.

I'm currently playing around in C++ but Regular Expressions are in almost every Higher Level language and I'd like to know how different languages use/ handle regular expressions also but that's more an after thought.

感谢您帮助理解它!

我仍在寻找更多示例并就此进行讨论,但到目前为止反响非常好.:)

I'm still looking for more examples and talk on this but the response so far has been great. :)

推荐答案

这取决于您所使用的语言的复杂程度.

It depends on how complex the language you're dealing with is.

这在工作时很棒,但只有在没有转义约定时才有效.例如,它不适用于 CSV,因为带引号的字符串中的逗号不是正确的分割点.

This is great when it works, but only works when there are no escaping conventions. It does not work for CSV for example because commas inside quoted strings are not proper split points.

foo,bar,baz

foo,bar,baz

可以拆分,但是

foo,bar,baz"

foo,"bar,baz"

不能.

正则表达式非常适合具有正则语法"的简单语言.由于反向引用,Perl 5 正则表达式的功能更强大一些,但一般的经验法则是:

Regular expressions are great for simple languages that have a "regular grammar". Perl 5 regular expressions are a little more powerful due to back-references but the general rule of thumb is this:

如果你需要匹配方括号 ((...), [...]) 或其他嵌套的 HTML 标签,那么正则表达式本身不是足够了.

If you need to match brackets ((...), [...]) or other nesting like HTML tags, then regular expressions by themselves are not sufficient.

您可以使用正则表达式将字符串分成已知数量的块——例如,从日期中提取月/日/年.但是,它们在解析复杂的算术表达式时是错误的.

You can use regular expressions to break a string into a known number of chunks -- for example, pulling out the month/day/year from a date. They are the wrong job for parsing complicated arithmetic expressions though.

显然,如果你写了一个正则表达式,走开喝杯咖啡,回来,并不能轻易理解你刚刚写的内容,那么你应该寻找一种更清晰的方式来表达你在做什么.电子邮件地址可能已经达到了正确的极限&使用正则表达式可读处理.

Obviously, if you write a regular expression, walk away for a cup of coffee, come back, and can't easily understand what you just wrote, then you should look for a clearer way to express what you're doing. Email addresses are probably at the limit of what one can correctly & readably handle using regular expressions.

解析器生成器和手动编码的下推/PEG 解析器非常适合处理更复杂的输入,您需要处理嵌套,因此您可以构建或处理运算符优先级或关联性.

Parser generators and hand-coded pushdown/PEG parsers are great for dealing with more complicated input where you need to handle nesting so you can build a tree or deal with operator precedence or associativity.

上下文无关解析器通常使用正则表达式首先将输入分成块(空格、标识符、标点符号、带引号的字符串),然后使用语法将该块流转换为树形式.

Context free parsers often use regular expressions to first break the input into chunks (spaces, identifiers, punctuation, quoted strings) and then use a grammar to turn that stream of chunks into a tree form.

CF 语法的经验法则是

The rule of thumb for CF grammars is

如果正则表达式不够用,但语言中的所有单词都具有相同的含义,而不管之前的声明如何,那么 CF 可以工作.

If regular expressions are insufficient but all words in the language have the same meaning regardless of prior declarations then CF works.

非上下文无关

如果您语言中的单词根据上下文改变含义,那么您需要更复杂的解决方案.这些几乎都是手工编码的解决方案.

Non context free

If words in your language change meaning depending on context, then you need a more complicated solution. These are almost always hand-coded solutions.

例如,在 C 中,

#ifdef X
  typedef int foo
#endif

foo * bar

如果 foo 是一个类型,那么 foo * bar 是一个名为 barfoo 指针的声明.否则,它是一个名为 foo 的变量乘以一个名为 bar 的变量.

If foo is a type, then foo * bar is the declaration of a foo pointer named bar. Otherwise it is a multiplication of a variable named foo by a variable named bar.

这篇关于正则表达式对比字符串解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆