什么是常规语言? [英] What is a regular language?
问题描述
我正在尝试理解语言级别的概念(常规、上下文无关、上下文敏感等).
I'm trying to understand the concept of languages levels (regular, context free, context sensitive, etc.).
我可以很容易地查到这一点,但我发现的所有解释都是一堆符号和关于集合的内容.我有两个问题:
I can look this up easily, but all explanations I find are a load of symbols and talk about sets. I have two questions:
您能否用文字描述什么是常规语言,以及这些语言有何不同?
Can you describe in words what a regular language is, and how the languages differ?
人们从哪里学习理解这些东西?据我了解,它是形式数学?我在大学有几门课程使用了它,几乎没有人理解它,因为导师只是假设我们知道它.我在哪里可以学习它,为什么人们期望"在这么多来源中了解它?就好像教育有差距一样.
Where do people learn to understand this stuff? As I understand it, it is formal mathematics? I had a couple of courses at uni which used it and barely anyone understood it as the tutors just assumed we knew it. Where can I learn it and why are people "expected" to know it in so many sources? It's like there's a gap in education.
这是一个示例:
属于该集合的任何语言都是字母表上的常规语言.
Any language belonging to this set is a regular language over the alphabet.
语言怎么能超越"一切?
How can a language be "over" anything?
推荐答案
在计算机科学的上下文中,词是符号的串联.使用的符号称为字母.例如,某些由字母 {0,1,2,3,4,5,6,7,8,9}
组成的单词将是 1
、2
、12
、543
、1000
和 002
.
In the context of computer science, a word is the concatenation of symbols. The used symbols are called the alphabet. For example, some words formed out of the alphabet {0,1,2,3,4,5,6,7,8,9}
would be 1
, 2
, 12
, 543
, 1000
, and 002
.
语言 是所有可能单词的子集.例如,我们可能想要定义一种语言来捕获所有精英 MI6 特工.这些都以双 0 开头,因此语言中的单词将是 007
、001
、005
和 0012
>,但不是 07
或 15
.为简单起见,我们说语言是一个字母表";而不是由字母表中中的符号串联形成的单词子集".
A language is then a subset of all possible words. For example, we might want to define a language that captures all elite MI6 agents. Those all start with double-0, so words in the language would be 007
, 001
, 005
, and 0012
, but not 07
or 15
. For simplicity's sake, we say a language is "over an alphabet" instead of "a subset of words formed by concatenation of symbols in an alphabet".
在计算机科学中,我们现在想要对语言进行分类.如果可以通过一个接一个地检查单词中的所有符号来确定一个单词是否在具有算法/具有恒定(有限)内存的机器的语言中,我们称该语言为regular.仅由单词 42
组成的语言是规则的,因为您可以决定一个单词是否在其中,而不需要任意数量的内存;您只需检查第一个符号是否为 4,第二个是否为 2,以及后面是否还有其他数字.
In computer science, we now want to classify languages. We call a language regular if it can be decided if a word is in the language with an algorithm/a machine with constant (finite) memory by examining all symbols in the word one after another. The language consisting just of the word 42
is regular, as you can decide whether a word is in it without requiring arbitrary amounts of memory; you just check whether the first symbol is 4, whether the second is 2, and whether any more numbers follow.
所有单词数量有限的语言都是规则的,因为我们(理论上)可以只构建一个恒定大小的控制流树(你可以将它想象成一堆嵌套的 if
语句一个接一个地检查一个数字).例如,我们可以测试一个单词是否在10 到 99 之间的质数"中.具有以下结构的语言,除了用于编码我们当前所在代码行的内存外,不需要任何内存:
All languages with a finite number of words are regular, because we can (in theory) just build a control flow tree of constant size (you can visualize it as a bunch of nested if
-statements that examine one digit after the other). For example, we can test whether a word is in the "prime numbers between 10 and 99" language with the following construct, requiring no memory except the one to encode at which code line we're currently at:
if word[0] == 1:
if word[1] == 1: # 11
return true # "accept" word, i.e. it's in the language
if word[1] == 3: # 13
return true
...
return false
注意所有的有限语言都是正则的,但并不是所有的正则语言都是有限的;我们的双 0 语言包含无限多的单词(007
、008
,还有 004242
和 0012345
),但可以用常量内存测试:要测试一个词是否属于其中,检查第一个符号是否为0
,第二个符号是否为0
.如果是这样,请接受它.如果单词短于三个或不以 00
开头,则它不是 MI6 代码名称.
Note that all finite languages are regular, but not all regular languages are finite; our double-0 language contains an infinite number of words (007
, 008
, but also 004242
and 0012345
), but can be tested with constant memory: To test whether a word belongs in it, check whether the first symbol is 0
, and whether the second symbol is 0
. If that's the case, accept it. If the word is shorter than three or does not start with 00
, it's not an MI6 code name.
形式上,有限状态机或正则语法 用于证明一种语言是正则的.这些类似于上面的 if
语句,但允许任意长的单词.如果有一个有限状态机,也有一个正则文法,反之亦然,所以显示两者就足够了.例如,我们的双 0 语言的有限状态机是:
Formally, the construct of a finite-state machine or a regular grammar is used to prove that a language is regular. These are similar to the if
-statements above, but allow for arbitrarily long words. If there's a finite-state machine, there is also a regular grammar, and vice versa, so it's sufficient to show either. For example, the finite state machine for our double-0 language is:
start state: if input = 0 then goto state 2
start state: if input = 1 then fail
start state: if input = 2 then fail
...
state 2: if input = 0 then accept
state 2: if input != 0 then fail
accept: for any input, accept
等效的正则语法是:
start → 0 B
B → 0 accept
accept → 0 accept
accept → 1 accept
...
等效的正则表达式是:
00[0-9]*
有些语言不常规.例如,任意数量的1
的语言,后面跟着相同数量的2
(通常写成1n2n,对于任意 n) 是不规则的 - 您需要的内存数量超过恒定数量(= 恒定数量的状态)来存储 1
s 来决定一个词是否在语言中.
Some languages are not regular. For example, the language of any number of 1
, followed by the same number of 2
(often written as 1n2n, for an arbitrary n) is not regular - you need more than a constant amount of memory (= a constant number of states) to store the number of 1
s to decide whether a word is in the language.
这通常应该在理论计算机科学课程中进行解释.幸运的是,维基百科解释了正式和常规语言 非常好.
This should usually be explained in the theoretical computer science course. Luckily, Wikipedia explains both formal and regular languages quite nicely.
这篇关于什么是常规语言?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!