从CYK算法生成分析树的步骤(自然语言处理) [英] Steps to generate parse tree from CYK Algorithm (Natural Language Processing)

查看:934
本文介绍了从CYK算法生成分析树的步骤(自然语言处理)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在从事一个涉及NLP的项目.我已经实现了Jurafsky和Martin(在第450页的算法)中给出的CKY标识符.这样生成的表实际上将非终结符存储在表中(而不是通常的布尔值).但是,我唯一遇到的问题是检索解析树.

I am currently working on a project involving NLP. I have implemented a CKY identifier as given in Jurafsky and Martin (algorithm on page 450). The table so produced actually stores the nonterminals in the table (instead of the usual boolean values). However, the only issue I am getting is to retrieve the parse tree.

以下是我的CKY标识符的作用说明:

Here is an illustration of what my CKY identifier does:

这是我的语法

          S -> NP VP 
          S -> VP 
          NP -> MODAL PRON | DET NP | NOUN VF | NOUN | DET NOUN | DET FILENAME
          MODAL -> 'MD'
          PRON -> 'PPSS' | 'PPO'
          VP -> VERB NP
          VP -> VERB VP
          VP -> ADVERB VP
          VP -> VF
          VERB -> 'VB' | 'VBN'
          NOUN -> 'NN' | 'NP'
          VF -> VERB FILENAME
          FILENAME -> 'NN' | 'NP'
          ADVERB -> 'RB'
          DET -> 'AT'

这是算法:

for j  from i to LENGTH(words) do
    table[j-1,j] = A where A -> POS(word[j])
    for i from j-2 downto 0
        for k from i+1 to j-1
            table[i,j] = Union(table[i,j], A such that A->BC)
            where B is in table[i,k] and C is in table[k,j]

这是我的解析表在填充后的样子:

And this is what my parsing table looks like after being filled:

现在我知道由于S驻留在[0,5]中,因此字符串已被解析,并且对于k = 1(根据Martin和Jurafsky中给出的算法),我们有S-> table [0 ] [2]表[2] [5] 即S-> NP VP

Now that I know that since S resides in [0,5], the string has been parsed, and that for k = 1 (as per the algorithm given in Martin and Jurafsky), we have S -> table[0][2] table[2][5] i.e. S -> NP VP

我得到的唯一问题是我已经能够检索到所使用的规则,但是它们却是混乱的格式,即不是基于它们在解析树中的外观.有人可以建议一种算法来检索正确的解析树吗?

The only issue I am getting is that I have been able to retrieve the rules used, but then they are in a jumbled format, i.e. not on the basis of their appearance in parse tree. Can someone suggest an algorithm to retrieve the correct parse tree?

谢谢.

推荐答案

您应该以递归方式访问表的单元格,并以与S节点相同的方式展开它们,直到所有内容都成为终端为止(因此,请不要还有其他事情要展开).在您的示例中,您首先转到单元格[0] [2];这是一个终端,您无需执行任何操作.接下来,您转到[2] [5],这是由[2] [3]和[3] [5]构成的非端子.您访问[2] [3],它是一个终点站. [3] [5]是一个非终端,由两个终端组成.大功告成这是Python的演示:

You should visit recursively the cells of your table and unfold them in the same way you did for the S node until everything is a terminal (so you don't have anything else to unfold). In your example, you first go to cell [0][2]; this is a terminal, you don't have to do anything. Next you go to [2][5], this is a non-terminal made by [2][3] and [3][5]. You visit [2][3], it's a terminal. [3][5] is a non-terminal, made by two terminals. You are done. Here is a demo in Python:

class Node:
    '''Think this as a cell in your table'''
    def __init__(self, left, right, type, word):
        self.left = left
        self.right = right
        self.type = type
        self.word = word

# Declare terminals
t1 = Node(None,None,'MOD','can')
t2 = Node(None,None,'PRON','you')
t3 = Node(None,None,'VERB', 'eat')
t4 = Node(None,None,'DET', 'a')
t5 = Node(None,None,'NOUN','flower')

# Declare non-terminals
nt1 = Node(t1,t2, 'NP', None)
nt2 = Node(t4,t5, 'NP', None)
nt3 = Node(t3,nt2,'VP', None)
nt4 = Node(nt1,nt3,'S', None)

def unfold(node):
    # Check for a terminal
    if node.left == None and node.right == None:
        return node.word+"_"+node.type

    return "["+unfold(node.left)+" "+unfold(node.right)+"]_"+node.type

print unfold(nt4)

输出:

[[can_MOD you_PRON]_NP [eat_VERB [a_DET flower_NOUN]_NP]_VP]_S

这篇关于从CYK算法生成分析树的步骤(自然语言处理)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆