在NLTK解析器语法中混合单词和PoS标签 [英] Mixing words and PoS tags in NLTK parser grammars

查看:82
本文介绍了在NLTK解析器语法中混合单词和PoS标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用NLTK已有一段时间了,现在正是为特殊分块定义自定义解析器语法的时候.我正在按照 http://nltk.googlecode.com/svn中的描述进行操作/trunk/doc/book/ch07.html ,但我感兴趣的是与本章中所描述的稍有不同.例如在示例7.10中,将以下内容用于动词阶段: VP:{< VB.*>< NP | PP | CLAUSE> + $} 我只想匹配使用一个特定动词而不是任何动词的句子.就像是: VP:{go< NP | PP | CLAUSE> + $}

I've been playing with NLTK for awhile already and am at the point to define custom parser grammar for special chunking. I am following the description in http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html but what I am interested to do is slightly different than what is described in the chapter. For instance in example 7.10 instead using the following for the verb phase: VP: {<VB.*><NP|PP|CLAUSE>+$} I would like to just match sentences that use one particular verb and not any verb. Something like: VP: {go<NP|PP|CLAUSE>+$}

换句话说,我想匹配实际单词而不是单词的PoS标签,并在正则表达式中混合并匹配实际单词和PoS标签.

In other words I would like to match the actual word and not the PoS tag for the word and mix and match actual words and PoS tags in the regular expression.

这可能吗?

推荐答案

不适用于由nltk pos-tagger产生的标准PoS标签.

Not with the standard PoS tags churned out by the nltk pos-tagger.

如果您需要为不同的动词做语法,一个有用的技巧可能是预处理标签并将所有动词的标记附加到标签上.因此,您可以使用看起来像VP的正则表达式字符串:{+ $}

If you need to do grammars for different verbs, a useful hack might be to preprocess the tags and append the token to the tag for all the verbs. Hence you could use a regex string that looks like VP: {+$}

这篇关于在NLTK解析器语法中混合单词和PoS标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆