CYK算法如何工作? [英] How does the CYK algorithm work?

查看:155
本文介绍了CYK算法如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须检查是否可以从给定的上下文中以Chomsky正常形式派生一个字符串.我正在使用C ++.

I have to check if a string can be derived from a given context free that is in Chomsky normal form. I'm using C++.

Wikipedia文章中关于CYK算法的伪代码非常好,但是我不太了解.

There is very nice pseudocode on the Wikipedia article covering the CYK algorithm, but I can't understand it very well.

有人会为我提供另一种用于CYK算法的伪代码来帮助我,还是在Wiki文章中解释该代码?

Would someone be so kind to help me out by giving me another pseudocode for CYK algorithm, or maybe explain the one in the wiki article?

推荐答案

CYK算法以Chomsky正常形式的CFG作为输入.这意味着每个产品都具有以下形式

The CYK algorithm takes as input a CFG that's in Chomsky normal form. That means that every production either has the form

  • S→ a,对于某些终端机a,或
  • S→ AB,对于某些非终端A和B.

现在,假设您有一个字符串w,并且想要查看是否可以从其起始符号为S的语法中得出该字符串.有两种选择:

Now, imagine you have a string w and you want to see whether you can derive it from a grammar whose start symbol is S. There are two options:

  1. 如果w是单个字符长,则解析它的唯一方法是使用形式为S→的产生式.对于某些字符a.因此,请查看是否有任何单字符产品都匹配a.
  2. 如果w长于一个字符,则解析它的唯一方法是使用形式为S→的产生式. AB对于某些非终结点A和B.这意味着我们需要将字符串w分为两个部分x和y,其中A派生x,而B派生y.一种方法是尝试将w分为两部分的所有可能方法,并查看它们中的任何一种是否有效.

请注意,这里的选项(2)最终成为递归解析问题:要查看是否可以从S派生w,请查看是否可以从A派生x和从B派生y.

Notice that option (2) here ends up being a recursive parsing problem: to see whether you can derive w from S, see whether you can derive x from A and y from B.

有了这种见识,这里是递归函数的伪代码,您可以用来查看非终结符S是否派生字符串w:

With that insight, here's pseudocode for recursive function you can use to see whether a nonterminal S derives a string w:

bool canDerive(nonterminal S, string w) {
    return canDeriveRec(S, w, 0, w.size());
}

/* Can you derive the substring [start, end) of w from S? */
bool canDeriveRec(nonterminal S, string w, int start, int end) {
    /* Base case: Single characters */
    if (end - start == 1) {
        return whether there is a production S -> a, where a = w[start];
    }

    /* Recursive case: Try all possible splits */
    for (each production S -> AB) {
        for (int mid = start + 1; mid < end; mid++) {
            if (canDeriveRec(A, w, start, mid) &&
                canDeriveRec(B, w, mid, end)) {
                return true;
            }
        }
    }
    return false;
}

该算法可以正常工作,但是如果您绘制出递归的形状,您会发现

This algorithm works correctly, but if you map out the shape of the recursion you'll find that

  1. 它进行了大量的冗余递归调用,但是
  2. 没有很多不同的递归调用.

实际上,可能的不同调用次数为O(n 2 N),其中n是输入字符串的长度(对于起始索引和结束索引的每种可能组合)和N是语法中非终结符的数量.这些观察结果表明,该算法将从记忆化或动态编程中受益,这取决于您碰巧认为哪种方法更好.

In fact, the number of distinct possible calls is O(n2 N), where n is the length of the input string (for each possible combination of a start and end index) and N is the number of nonterminals in the grammar. These observations suggest that this algorithm would benefit either from memoization or dynamic programming, depending on which approach you happen to think is nicer.

当您采用上述递归算法并记忆结果时,或者将上述递归算法转换为动态规划问题时,等效的就是CYK算法.

The CYK algorithm is what you get when you take the above recursive algorithm and memoize the result, or equivalently when you convert the above recursive algorithm into a dynamic programming problem.

有O(n 2 N)个可能的递归调用.对于每个尝试的生产,它都会执行O(n)工作.如果对于一个非终端平均有P个生产,这意味着整个运行时间为O(n 3 NP),对于固定时间为O(n 3 )语法.

There are O(n2 N) possible recursive calls. For each production tried, it does O(n) work. If there are P productions, on average, for a nonterminal, this means the overall runtime is O(n3 NP), which is O(n3) for a fixed grammar.

这篇关于CYK算法如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆