为什么有这么多不同的正则表达式方言? [英] Why are there so many different regular expression dialects?

查看:76
本文介绍了为什么有这么多不同的正则表达式方言?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道为什么必须有这么多正则表达式方言.为什么似乎有这么多语言,而不是重复使用久经考验的真正方言,似乎一心要编写自己的方言.

I'm wondering why there have to be so many regular expression dialects. Why does it seem like so many languages, rather then reusing a tried and true dialect, seem bent on writing their own.

像这样.

我的意思是,我知道其中一些确实有非常不同的后端.但那不应该从程序员那里抽象出来吗?

I mean, I understand that some of these do have very different backends. But shouldn't that be abstracted from the programmer?

我更多地指的是奇怪但很小的差异,例如括号必须在一种语言中转义,而在另一种语言中是文字.或者元字符的含义有些不同.

I'm more referring to the odd but small differences, like where parentheses have to be escaped in one language, but are literals in another. Or where meta-characters mean somewhat different things.

我们不能为正则表达式提供某种通用方言有什么特别的原因吗?我认为对于必须使用多种语言工作的程序员来说,这会让事情变得更容易.

Is there any particular reason we can't have some sort of universal dialect for regular expressions? I would think it would make things much easier for programmers who have to work in multiple languages.

推荐答案

因为正则表达式只有三个操作:

Because regular expressions only have three operations:

  • 串联
  • 联合|
  • Kleene 闭包 *

其他一切都是扩展或语法糖,因此没有标准化的来源.捕获组、反向引用、字符类、基数运算等都是对正则表达式原始定义的补充.

Everything else is an extension or syntactic sugar, and so has no source for standardization. Things like capturing groups, backreferences, character classes, cardinality operations, etc are all additions to the original definition of regular expressions.

其中一些扩展使正则表达式"不再是正则.由于这些附加功能,他们能够决定非正则语言,但无论如何我们仍然称它们为正则表达式.

Some of these extensions make "regular expressions" no longer regular at all. They are able to decide non-regular languages because of these extras, but we still call them regular expressions regardless.

随着人们添加更多扩展,他们通常会尝试使用正则表达式的其他常见变体.这就是为什么几乎所有方言都使用 X+ 来表示一个或多个 X",这本身只是编写 XX* 的快捷方式.

As people add more extensions, they will often try to use other, common variations of regular expressions. That's why nearly every dialect uses X+ to mean "one or many Xs", which itself is just a shortcut for writing XX*.

但是当添加了新功能时,就没有标准化的基础,所以必须有人来弥补.如果大约同时有不止一组设计师提出类似的想法,他们就会有不同的方言.

But when new features get added, there's no basis for standardization, so someone has to make something up. If more than one group of designers come up with similar ideas at around the same time, they'll have different dialects.

这篇关于为什么有这么多不同的正则表达式方言?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆