水平马尔可夫化 [英] Horizontal Markovization

查看:130
本文介绍了水平马尔可夫化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须实现水平标记(NLP概念),并且在理解树的外观时遇到了一些麻烦.我一直在阅读克莱因和曼宁的论文,但他们没有解释具有2阶或3阶水平标记的树将看起来像.有人可以对算法进行一些说明吗,应该使树看起来像什么?我是NLP的新手.

I have to implement horizontal markovization (NLP concept) and I'm having a little trouble understanding what the trees will look like. I've been reading the Klein and Manning paper, but they don't explain what the trees with horizontal markovization of order 2 or order 3 will look like. Could someone shed some light on the algorithm and what the trees are SUPPOSED to look like? I'm relatively new to NLP.

推荐答案

所以,假设您有一堆扁平规则,例如:

So, let's say you have a bunch of flat rules like:

NP 
    NNP
    NNP
    NNP
    NNP

VP
   V
   Det
   NP

对这些元素进行二值化处理时,您想要保留上下文(即,这不仅是Det,而且具体来说是作为VP一部分的动词后面的Det).为此,通常使用如下注释:

When you binarize these you want to keep the context (i.e. this isn't just a Det but specifically a Det following a Verb as part of a VP). To do so normally you use annotations like this:

NP 
    NNP
    NP->NNP
        NNP
        NP->NNP->NNP
            NNP
            NP->NNP->NNP->NNP
                NNP

VP
   V
   VP->V
       Det
       VP->V->Det
          NP

需要对树进行二值化,但是这些注释并不总是很有意义.对于动词短语示例,它们可能会有些意义,但是您真正关心的另一个是,名词短语可以是相当长的专有名词字符串(例如"Peter B. Lewis Building"或"Hope Memorial Bridge Project"周年纪念日").因此,使用水平马尔可夫化,您将稍微折叠一些注释,从而丢弃一些上下文.马尔可夫化的顺序是要保留的上下文量.因此,使用普通注解,您基本上处于无限顺序:选择保留所有上下文并且不折叠任何内容.

You need to binarize the tree, but these annotations are not always very meaningful. They might be somewhat meaningful for the Verb Phrase example, but all you really care about for the other one is that a noun phrase can be a fairly long string of proper nouns (e.g. "Peter B. Lewis Building" or "Hope Memorial Bridge Project Anniversary"). So with Horizontal Markovization you will collapse some of the annotations slightly, throwing away some of the context. The order of Markovization is the amount of context you are going to retain. So with the normal annotations you are basically at infinite order: choosing to retain all context and collapse nothing.

顺序0意味着您将删除所有上下文,并且会得到一棵没有花式注释的树,像这样:

Order 0 means you're going to drop all of the context and you get a tree without the fancy annotations, like this:

NP 
    NNP
    NNP
        NNP
        NNP
            NNP
            NNP
                NNP

第1阶意味着您将只保留一个上下文项,并且会得到一棵像这样的树:

Order 1 means you'll retain only one term of context and you get a tree like this:

NP 
    NNP
    NP->...NNP  **one term: NP->**
        NNP
        NP->...NNP  **one term: NP->**
            NNP
            NP->...NNP  **one term: NP->**
                NNP

第2阶意味着您将保留两个上下文项,并且会得到如下所示的树:

Order 2 means you'll retain two terms of context and you get a tree like this:

NP 
    NNP
    NP->NNP  **two terms: NP->NNP**
        NNP
        NP->NNP->...NNP  **two terms: NP->NNP->**
            NNP
            NP->NNP->...NNP  **two terms: NP->NNP->**
                NNP

这篇关于水平马尔可夫化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆