在 Prolog 中使用差异列表的这种上下文无关语法如何运作? [英] How is this context free grammar using difference lists in Prolog functioning?

查看:12
本文介绍了在 Prolog 中使用差异列表的这种上下文无关语法如何运作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读有关 Prolog 中上下文无关语法的教程,他们在页面底部提到使用差异列表在 Prolog 中实现上下文无关语法,包括以下代码块:

I'm reading this tutorial on context free grammars in Prolog, and they mention at the bottom of the page implementing a context free grammar in Prolog using difference lists, with the following code block included:

s(X,Z):-  np(X,Y),  vp(Y,Z). 

np(X,Z):-  det(X,Y),  n(Y,Z). 

vp(X,Z):-    v(X,Y),  np(Y,Z). 
vp(X,Z):-    v(X,Z). 

det([the|W],W). 
det([a|W],W). 

n([woman|W],W). 
n([man|W],W). 

v([shoots|W],W).

它提到:

仔细考虑这些规则.例如,s 规则说:我知道列表 X 和 Z 表示一个句子,如果 (1) 我可以消费 X 并留下一个 Y,并且 X 和 Y 对表示一个名词短语,并且 (2)然后我可以继续消费 Y 留下 Z,YZ 对代表一个动词短语.np 规则和第二个 vp 规则的工作方式类似.

Consider these rules carefully. For example, the s rule says: I know that the pair of lists X and Z represents a sentence if (1) I can consume X and leave behind a Y , and the pair X and Y represents a noun phrase, and (2) I can then go on to consume Y leaving Z behind , and the pair Y Z represents a verb phrase . The np rule and the second of the vp rules work similarly.

将第一个列表视为需要消耗的内容(或者如果您愿意:输入列表),将第二个列表视为我们应该留下的内容(或:输出列表).从这个(相当程序)的角度来看,差异列表

Think of the first list as what needs to be consumed (or if you prefer: the input list ), and the second list as what we should leave behind (or: the output list ). Viewed from this (rather procedural) perspective the difference list

[a,woman,shoots,a,man]  [].

代表一个女人射杀一个男人的句子,因为它说:如果我把左边的符号都吃光了,把右边的符号都留下,那么我就有了我感兴趣的句子.也就是我们的句子感兴趣的是这两个列表内容的区别.

represents the sentence a woman shoots a man because it says: If I consume all the symbols on the left, and leave behind the symbols on the right, then I have the sentence I am interested in. That is, the sentence we are interested in is the difference between the contents of these two lists.

这就是我们重写识别器所需要了解的关于差异列表的全部内容.如果我们简单地记住消费某物并留下某物的想法,我们会得到以下识别器:

That’s all we need to know about difference lists to rewrite our recogniser. If we simply bear in mind the idea of consuming something, and leaving something behind in mind, we obtain the following recogniser:

作为一种解释,但这并没有做任何事情来向我澄清这段代码.我知道它比使用 append/3 更有效,但至于消耗东西并留下其他东西的概念,它似乎比以前的 append/3 更加混乱实施,我只是无法做出正面或反面.此外,当我将该代码复制并粘贴到 Prolog 可视化工具 以尝试理解它时,可视化工具会说是错误.有人可以对此有所了解吗?

As an explanation, but that just isn't doing anything to clarify this code to me. I understand it's more efficient than using append/3, but as for the notion of consuming things and leaving others behind, it seems so much more confusing than the previous append/3 implementation, and I just can't make heads or tails of it. Furthermore when I copy and paste that code into a Prolog visualizer to try to understand it, the visualizer says there are errors. Could anyone shed some light on this?

推荐答案

作为事实列出

让我们尝试用一个反例来解释这一点.让我们用简单的事实来指定名词、动词等:

Listing as facts

Lets try to explain this with a counter example. Let's specify the nouns, verbs, etc. with simple facts:

det(the). 
det(a). 

n(woman). 
n(man). 

v(shoots).

现在我们可以实现一个名词短语 np为:

Now we can implement a noun phrase np as:

np([X,Y]) :-
    det(X),
    n(Y).

换句话说,我们说名词短语是一个有两个词的句子,第一个是 det,第二个是 n".这将起作用:如果我们查询 np([a,woman]),它将成功等等.

In other words we say "a noun phrase is a sentence with two words the first being a det, the second a n". And this will work: if we query np([a,woman]), it will succeed etc.

但是现在我们需要做一些更高级的事情,定义动词短语.有两种可能的动词短语:一个带有动词和名词短语的短语,最初定义为:

But now we need to do something more advance, define the verb phrase. There are two possible verb phrases: the one with a verb and a noun phrase that was originally defined as:

vp(X,Z):- v(X,Y),np(Y,Z).

我们可以这样定义:

vp([X|Y]) :-
    v(X),
    np(Y).

只有一个动词:

vp(X,Z):- v(X,Z).

可以转化为:

vp([X]) :-
    v(X).

猜谜题

然而,问题在于两种变体的单词数量不同:动词短语有一个单词和三个单词.这不是一个真正的问题,但现在说 - 我知道这不是正确的英语 - 存在一个定义为 vp 后跟 np 的句子,所以这将是:

The guessing problem

The problem however is that both variants have a different number of words: there are verb phrases with one word and with three words. That's not really a problem, but now say - I know this is not correct English - there exists a sentence defined as vp followed by np, so this would be:

s(X,Z):-  vp(X,Y),  np(Y,Z).

在原始语法中.

问题在于,如果我们想将其转换为新的表示方式,我们需要知道 vp消耗多少(多少单词将被 vp 吃掉).我们无法提前知道这一点:因为此时我们对句子了解不多,所以我们无法猜测 vp 是否会吃掉一个或三个单词.

The problem is that if we want to transform this into our new way of representing it, we need to know how much vp will consume (how much words will be eaten by vp). We cannot know this in advance: since at this point we don't know much about the sentence, we cannot make a guess whether vp will eat one or three words.

我们当然可以猜测单词的数量:

We might of course guess the number of words with:

s([X|Y]) :-
    vp([X]),
    np(Y).
s([X,Y,Z|T]) :-
    vp([X,Y,Z]),
    np(Z).

但我希望你能想象一下,如果你用 1、3、5 和 7 个词来定义动词短语,事情就会有问题.解决此问题的另一种方法是将其留给 Prolog:

But I hope you can imagine that if you would define verb phrases with 1, 3, 5 and 7 words, things will get problematic. Another way to solve this, is leave this to Prolog:

s(S) :-
    append(VP,NP,S),
    vp(VP),
    np(NP).

现在 Prolog 会猜测如何先将句子细分成两部分,然后再尝试匹配每个部分.但问题是对于一个有n个单词的句子,有n个断点.

Now Prolog will guess how to subdivide the sentence into two parts first and then try to match every part. But the problem is that for a sentence with n words, there are n breakpoints.

所以 Prolog 将首先将其拆分为:

So Prolog will for instance first split it as:

VP=[],NP=[shoots,the,man,the,woman]

(记住我们交换了动词短语和名词短语的顺序).显然,如果 vp 得到一个空字符串,它不会很高兴.所以很容易被拒绝.但接下来它说:

(remember we swapped the order of the verb phrase and noun phrase). Evidently vp will not be very happy if it gets an empty string. So that will be rejected easily. But next it says:

VP=[shoots],NP=[the,man,the,woman]

现在 vp 只对 shoots 感到满意,但实现这一点需要一些计算工作.然而,np 对这个冗长的部分并不感兴趣.所以 Prolog 再次回溯:

Now vp is happy with only shoots, but it will require some computational effort to realize that. np is however not amused with this long part. So Prolog backtracks again:

VP=[shoots,the],NP=[man,the,woman]

现在 vp 会再次抱怨它被赋予了太多的单词.最后 Prolog 会正确分割它:

now vp will complain again that it has been given too much words. Finally Prolog will split it correctly with:

VP=[shoots,the,woman],NP=[the,woman]

关键是它需要大量的猜测.对于这些猜测中的每一个,vpnp 也需要工作.对于真正复杂的语法,vpnp 可能会进一步拆分句子,从而导致大量的试错.

The point is that it requires a large number of guesses. And for each of these guesses vp and np will require work as well. For a real complicated grammar, vp and np might split the sentence further resulting in a huge amount of try-and-error.

真正的原因是 append/3 没有语义"线索如何拆分句子,因此它尝试了所有可能性.然而,人们对 vp 可以提供有关它真正想要的句子份额的信息的方法更感兴趣.

The true reason is the append/3 has no "semantical" clue how to split the sentence, so it tries all possibilities. One is however more interested in an approach where vp could provide information about what share of the sentence it really wants.

此外,如果您必须将句子分成 3 个部分,则执行此操作的方法数量甚至会增加到 O(n^2) 等等.所以猜测是行不通的.

Furthermore if you have to split the sentence into 3 parts, the number of ways to do this even increases to O(n^2) and so on. So guessing will not do the trick.

你也可以尝试生成一个随机的动词短语,然后希望动词短语匹配:

You might also try to generate a random verb phrase, and then hope the verb phrase matches:

s(S) :-
    vp(VP),
    append(VP,NP,S),
    np(NP).

但在这种情况下,猜测的动词短语的数量将呈指数级增长.当然你可以给出提示"等来加快这个过程,但这仍然需要一些时间.

But in that case the number of guessed verb phrases will blow up exponentially. Of course you can give "hints" etc. to speed up the process, but it will still take some time.

您要做的是为每个谓词提供句子的一部分,使谓词看起来像:

What you want to do is to provide the part of the sentence to every predicate such that a predicate looks like:

predicate(Subsentence,Remaining)

Subsentence 是一个以该谓词开头的单词列表.例如,对于一个名词短语,它可能看起来像 [the,woman,shoots,the,man].每个谓词使用它感兴趣的词:直到某个点的词.在这种情况下,名词短语只对 ['the','woman'] 感兴趣,因为那是名词短语.为了进行剩余的解析,它返回剩余的部分[shoots,the,woman],希望其他谓词可以消耗句子的剩余部分.

Subsentence is a list of words that start for that predicate. For instance for a noun phrase, it might look like [the,woman,shoots,the,man]. Every predicate consumes the words it is interested in: the words up to a certain point. In this case, the noun-phrase is only interested in ['the','woman'], because that is a noun-phrase. To do the remaining parsing, it returns the remaining part [shoots,the,woman], in the hope that some other predicate can consume the remainder of the sentence.

我们的事实表很简单:

det([the|W],W). 
det([a|W],W). 

n([woman|W],W). 
n([man|W],W). 

v([shoots|W],W).

因此,这意味着如果您查询一个 setence:[the,woman,shoots,...],然后您询问 det/2 这是否是一个限定符,它会说:是的,the 是一个限定符,但是剩下的部分[woman,shoots,...]"不是这个限定符的一部分,请匹配它与其他东西.

It thus means that if you query a setence: [the,woman,shoots,...], and you ask the det/2 whether that's a determiner, it will say: "yes, the is a determiner, but the remaining part [woman,shoots,...]" is not part of the determiner, please match it with something else.

完成此匹配是因为列表表示为链表.[the,woman,shoots,...],实际上表示为[the|[woman|[shoots|...]]](所以它指向下一个子列表").如果匹配:

This matching is done because a list is represented as a linked list. [the,woman,shoots,...], is actually represented as [the|[woman|[shoots|...]]] (so it points to the next "sublist"). If you match:

    [the|[woman|[shoots|...]]]
det([the|W]                   ,W)

它将 [woman|[shoots|...]]W 统一,从而导致:

It will unify [woman|[shoots|...]] with W and thus result in:

det([the|[woman|[shoots|...]],[woman|[shoots|...]]).

因此返回 remaining 列表,它因此 消耗 部分.

Thus returning the remaining list, it has thus consumed the the part.

现在如果我们定义名词短语:

Now in case we define the noun phrase:

np(X,Z):-  det(X,Y),  n(Y,Z).

我们再次调用 [the,woman,shoots,...],它将查询统一的 X 与该列表.它将首先调用 det 来消耗 the,而无需任何回溯.接下来 Y 等于 [woman,shoots,...],现在 n/2 将消耗 woman 并返回 [射击,...].这也是 np 将返回的结果,另一个谓词将不得不使用它.

And we again call with [the,woman,shoots,...], it will query unifying X with that list. It will first call det that will consume the, without any need for backtracking. Next Y is equal to [woman,shoots,...], now the n/2 will consume woman and return [shoots,...]. This is also the result the np will return, and another predicate will have to consume this.

假设您将您的名字作为附加名词介绍:

Say you introduce your name as an additional noun:

n([doug,smith|W],W).

(很抱歉使用了小案例,但 Prolog 将这些视为变量).

(sorry for using small cases, but otherwise Prolog sees these as variables).

它只会尝试将前两个单词与 dougsmith 进行匹配,如果成功,则尝试匹配句子的其余部分.所以可以这样写一个句子:[the,doug,smith,shoots,the,woman](对不起,在英语中,一些名词短语直接映射到一个名词np(X,Y) :- n(X,Y) 因此可以删除 the 以获得更复杂的英语语法).

It will simply try to match the first two words with doug and smith, and if that succeeds, try to match the remaining of the sentence. So one can make a setence like: [the,doug,smith,shoots,the,woman] (sorry about that, furthermore in English, some noun phrases map to a noun directly np(X,Y) :- n(X,Y) so the the can be removed for a more complex English grammar).

猜测完全消除了吗?.消费上仍有重叠的可能.例如,您可以添加:

Is the guessing completely eliminated? No. It is still possible that there is overlap in consuming. For instance you might add:

n([doug,smith|W],W).
n([doug|W],W).

在这种情况下,如果您查询 [the,doug,smith,shoots,the,woman].它首先会消费/吃掉 det 中的内容,然后它会从 [doug,smith,...] 中寻找要消费的名词.有两个候选人.Prolog 将首先尝试只吃 doug,并将 [smith,shoots,...] 作为一个完整的动词短语匹配,但由于 smith不是动词,它会回溯,重新考虑只吃一个词,因此决定把 dougsmith 都当作名词来吃.

In that case if you query for [the,doug,smith,shoots,the,woman]. It will first consume/eat the in det, next it will look for a noun to consume from [doug,smith,...]. There are two candidates. Prolog will first try to eat only doug, and match [smith,shoots,...] as an entire verb phrase, but since smith is not a verb, it will backtrack, reconsider eating only one word and thus decinde to eat both doug and smith as a noun instead.

关键是当使用 append 时,Prolog 也必须回溯.

The point is that when using append, Prolog would have to backtrack as well.

通过使用差异列表,您可以吃任意数量的单词.返回剩余部分,以便其他句子部分(如动词短语)旨在消耗剩余部分.而且这个列表总是完全有根据的,所以绝对不会先用蛮力生成各种动词短语.

By using difference lists, you can eat an arbitrary amount of words. The remainder is returned such that other sentence parts like the verb phrase aim to consume the remainder. The list is furthermore always fully grounded, so one definitely does not use brute force to generate all kinds of verb phrases first.

这篇关于在 Prolog 中使用差异列表的这种上下文无关语法如何运作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆