什么是正则表达式? [英] what is regular expression?
问题描述
我知道这个问题似乎愚蠢,但事实并非如此.我的意思是完全.我对解析问题有充分的了解.我知道BNF/EBNF,我写过一种语法来解析我的一门大学课程中的简单无上下文语言.我只是从未见过正则表达式!我唯一记得的是上下文无关的语法可以做所有正则表达式可以做的事情.
I know this question seems stupid, but it isn't. I mean what is it exactly. I have a fair understanding of the parsing problem. I know BNF/EBNF, I've written grammar to parse simple context-free languages in one of my college courses. I just never met regular expressions before! The only thing that I remember about it is that context-free grammar can do all what regular expression can do.
此外,对于通常的编码来解析字符串有用吗?一个简单的例子会有所帮助.
Also, is it useful for a usual coding to parse strings? A simple example would be helpful.
推荐答案
正则表达式首先出现在数学和自动机理论中.正则表达式就是定义常规语言的东西.不必过多研究常规"的含义,就可以这样思考一种语言:
Regular expressions first came around in mathematics and automata theory. A regular expression is simply something which defines a regular language. Without going too much into what "regular" means, think of a language as this way:
- 一种语言由字符串组成.例如,英语是一种语言,它是由字符串组成的.
- 这些字符串由符号组成-称为字母.因此,字符串只是字母中符号的串联.
因此,您可能有一个字符串(记住,只是一个符号的串联),它不是给定语言的一部分.或者可以是语言.
So you could have a string (which is, remember, just a concatenation of symbols) which is not part of a given language. Or it could be in the language.
因此,假设您有一个由2个符号组成的字母:"0"和"1".假设您要使用该字母中的符号来创建一种语言.您可以创建以下规则:为了使字符串以我的语言显示,它只能包含0和1."
So lets say you have an alphabet made of 2 symbols: "0" and "1". And lets say you want to create a language using the symbols in that alphabet. You could create the following rule: "In order for a string to be in my language, it must have only 0's and 1's in it."
所以这些字符串是用您的语言编写的:
So these strings are in your language:
- 0
- 1
- 01
- 11001101
- ...等
这些不是您的语言:
- 2
- 桃子
- 00101105
这是一种非常简单的语言.怎么样:在我的语言中,每个字符串(类似于英语中的有效单词")必须带有0,然后可以跟着任意数量的0或1"
That's a pretty simple language. How about this: "In my language, each string [analogous to a valid 'word' in English] must being with a 0, and then can be followed by any number of 0's or 1's"
这些语言:
- 0111111
- 0000000
- 0101010110001
这些不是:
- 1
- 10000
- 1010
- 2000000
不是使用单词来定义语言,而是这些语言可能会变得非常复杂("1后跟2 0,后跟1和0以1结尾的任意组合"),我们想到了这种语法,称为常规表达式"来定义语言.
Well rather than defining the language using words - and these languages might get very complex ("1 followed by 2 0's followed by any combination of 1's and 0's ending with a 1"), we came up with this syntax called "regular expressions" to define the language.
第一语言应该是:
(0|1)*
(0或1,无限重复)
下一个:0(0|1)*
(0,后跟任意数量的0和1).
(0, followed by any number of 0's and 1's).
现在让我们考虑编程.创建正则表达式时,您说的是看看此文本.向我返回与 this 模式匹配的字符串".真正的意思是我已经定义了一种语言.请向我返回本文档中所有使用我的语言的字符串."
So lets think of programming now. When you create a regex, you are saying "Look at this text. Return to me strings which match this pattern." Which is really saying "I have defined a language. Return to me all strings within this document which are in my language."
因此,当您创建"regex"时,实际上是在定义一种常规语言,这是一个数学概念. (实际上,类似perl的正则表达式定义了非常规"语言,但这是一个单独的问题.)
So when you create a "regex", you are actually defining a regular language, which is a mathematical concept. (In actuality, perl-like regex define "nonregular" languages, but that is a separate issue.)
通过学习正则表达式的语法,您正在学习如何创建语言的来龙去脉,以便以后可以查看给定的字符串是否在该语言中.因此,通常人们会说正则表达式用于模式匹配-这基本上就是您在查看模式时正在做的事情,并查看它是否与您的语言规则匹配".
By learning the syntax of regex, you are learning the ins and outs of how to create a language, so that later you can see if a given string is "in" the language. Thus, commonly, people say that regex are for pattern matching - which is basically what you are doing when you look at a pattern, and see if it "matches" the rules for your language.
(这很长.它完全可以回答您的问题吗?)
(this was long. does it answer your question at all?)
这篇关于什么是正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!