使用NLTK的块解析器匹配单词 [英] Matching words with NLTK's chunk parser

查看:79
本文介绍了使用NLTK的块解析器匹配单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

NLTK的块解析器的正则表达式可以匹配POS标签,但是它们也可以匹配特定的单词吗?
因此,假设我想用名词后跟动词"left"(称为此模式L)对任何结构进行分块.例如,句子"\ DT dog \ NN left \ VB"应分块为
(S(DT the)(L(NN dog)(VB左))),但是句子"the \ DT dog \ NN slept \ VB"根本不会被分块.

NLTK's chunk parser's regular expressions can match POS tags, but can they also match specific words?
So, suppose I want to chunk any structure with a noun followed by the verb "left" (call this pattern L). For example, the sentence "the\DT dog\NN left\VB" should be chunked as
(S (DT the) (L (NN dog) (VB left))), but the sentence "the\DT dog\NN slept\VB" wouldn't be chunked at all.

我无法找到有关正则化分块语法的任何文档,并且我看到的所有示例仅与POS标签匹配.

I haven't been able to find any documentation on the chunking regex syntax, and all examples I've seen only match POS tags.

推荐答案

我遇到了类似的问题,在意识到正则表达式模式仅会检查标签之后,我在自己感兴趣的作品上更改了标签.

I had a similar problem and after realizing that the regex pattern will only examine tags, I changed the tag on the the piece I was interested in.

例如,我试图匹配产品名称和版本,并且使用\ NNP + \ CD之类的大块规则在"Internet Explorer 8.0"上有效,但在"Internet Explorer 8.0 SP2"上却失败,该功能将SP2标记为NNP.

For example, I was trying to match product name and version and using a chunk rule like \NNP+\CD worked for "Internet Explorer 8.0" but failed on "Internet Explorer 8.0 SP2" where it tagged SP2 as a NNP.

也许我本可以训练POS标记器,但决定将标记更改为SP,然后像\ NNP + \ CD \ SP *这样的大块规则将与任一示例匹配.

Perhaps I could have trained a POS tagger but decided instead to just change the tag to SP and then a chunk rule like \NNP+\CD\SP* will match either example.

这篇关于使用NLTK的块解析器匹配单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆