正则表达式与字符串解析 [英] Regular Expression Vs. String Parsing

查看:89
本文介绍了正则表达式与字符串解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

冒着打开一罐蠕虫并获得反对票的风险,我发现自己需要问一下,

At the risk of open a can of worms and getting negative votes I find myself needing to ask,

何时应该使用正则表达式,何时更适合使用字符串分析?

关于您的立场,我将需要示例理由.我希望您处理诸如可读性可维护性可扩展性之类的问题,以及可能涉及的大多数性能您的答案.

And I'm going to need examples and reasoning as to your stance. I'd like you to address things like readability, maintainability, scaling, and probably most of all performance in your answer.

我发现了另一个问题这里只有一个答案,甚至给人打扰.我需要更多地了解这一点.

I found another question Here that only had 1 answer that even bothered giving an example. I need more to understand this.

我目前正在使用C ++进行游戏,但是几乎所有高级语言都使用正则表达式,我想知道不同的语言也如何使用/处理正则表达式,但这是事后的思考.

I'm currently playing around in C++ but Regular Expressions are in almost every Higher Level language and I'd like to know how different languages use/ handle regular expressions also but that's more an after thought.

感谢您的理解!

我仍在寻找更多示例并就此进行讨论,但到目前为止,反应一直很好. :)

I'm still looking for more examples and talk on this but the response so far has been great. :)

推荐答案

这取决于您使用的语言的复杂程度.

It depends on how complex the language you're dealing with is.

这很好用,但是只有在没有转义约定的情况下才有用. 例如,它不适用于CSV,因为带引号的字符串中的逗号不是正确的分割点.

This is great when it works, but only works when there are no escaping conventions. It does not work for CSV for example because commas inside quoted strings are not proper split points.

foo,bar,baz

foo,bar,baz

可以拆分,但是

foo,"bar,baz"

foo,"bar,baz"

不能.

正则表达式非常适合具有常规语法" 的简单语言.由于反向引用,Perl 5正则表达式的功能要强大一些,但是一般的经验法则是这样的:

Regular expressions are great for simple languages that have a "regular grammar". Perl 5 regular expressions are a little more powerful due to back-references but the general rule of thumb is this:

如果您需要匹配方括号((...)[...])或其他类似HTML标记的嵌套,那么仅靠正则表达式是不够的.

If you need to match brackets ((...), [...]) or other nesting like HTML tags, then regular expressions by themselves are not sufficient.

您可以使用正则表达式将字符串分成已知数量的块-例如,从日期中提取月份/日期/年份.但是,它们是解析复杂算术表达式的错误工作.

You can use regular expressions to break a string into a known number of chunks -- for example, pulling out the month/day/year from a date. They are the wrong job for parsing complicated arithmetic expressions though.

很显然,如果您编写一个正则表达式,走开喝杯咖啡,回来,却又不容易理解您刚刚写的内容,那么您应该寻找一种更清晰的方式来表达自己的所作所为. 电子邮件地址可能只能正确使用&使用正则表达式可读地处理.

Obviously, if you write a regular expression, walk away for a cup of coffee, come back, and can't easily understand what you just wrote, then you should look for a clearer way to express what you're doing. Email addresses are probably at the limit of what one can correctly & readably handle using regular expressions.

解析器生成器和手动编码的下推/PEG解析器非常适合处理需要处理嵌套的复杂输入,因此您可以构建或进行处理 运算符优先级 或关联性.

Parser generators and hand-coded pushdown/PEG parsers are great for dealing with more complicated input where you need to handle nesting so you can build a tree or deal with operator precedence or associativity.

上下文无关的解析器通常使用正则表达式首先将输入分成多个块(空格,标识符,标点符号,带引号的字符串),然后使用语法将该块流转换为树形形式.

Context free parsers often use regular expressions to first break the input into chunks (spaces, identifiers, punctuation, quoted strings) and then use a grammar to turn that stream of chunks into a tree form.

CF语法的经验法则是

The rule of thumb for CF grammars is

如果正则表达式不足,但是不管事先声明如何,该语言中的所有单词都具有相同的含义,那么CF可以工作.

If regular expressions are insufficient but all words in the language have the same meaning regardless of prior declarations then CF works.

不受上下文限制

如果您的语言中的单词根据上下文而改变含义,那么您需要一个更复杂的解决方案.这些几乎都是手工编码的解决方案.

Non context free

If words in your language change meaning depending on context, then you need a more complicated solution. These are almost always hand-coded solutions.

例如,在C中,

#ifdef X
  typedef int foo
#endif

foo * bar

如果foo是类型,则foo * bar是名为barfoo指针的声明.否则,它是名为foo的变量与名为bar的变量的乘积.

If foo is a type, then foo * bar is the declaration of a foo pointer named bar. Otherwise it is a multiplication of a variable named foo by a variable named bar.

这篇关于正则表达式与字符串解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆