在NLTK中使用POS标签的CFG [英] CFG using POS tags in NLTK

查看:89
本文介绍了在NLTK中使用POS标签的CFG的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用NLTK检查给定的句子是否为语法.

I am trying to check if a given sentence is grammatical using NLTK.

例如:

OK:鲸鱼舔悲伤

OK : The whale licks the sadness

不好,我有史以来最好的

NOT OK : The best I ever had

我知道我可以进行POS标记,然后使用CFG解析器进行检查,但是我还没有找到使用POS标记而不是实际单词作为终端分支的CFG.

I know that I could do POS tagging, then use a CFG parser and check that way, but I have yet to find a CFG that uses POS tags instead of actual words as terminal branches.

有人可以推荐CFG吗?我认为自己创作是很愚蠢的,因为我不是语言学家,并且可能会遗漏重要的结构.

Is there a CFG that anyone can recommend? I think that making my own is silly, because I am not a linguist and will probably leave out important structures.

此外,我的应用程序使系统在理想情况下会拒绝许多句子,而仅批准极其确定的句子.

Also, my application is such that the system would ideally reject many sentences and only approve sentences it is extremely sure of.

谢谢:D

推荐答案

CFG的终端节点可以是任何东西,甚至可以是POS标签.只要您的词组规则可以识别POS而不是单词作为输入,那么使用POS声明语法就不会有问题.

The terminal nodes of the CFG can be anything, even POS tags. As long as your phrasal rules recognize POS instead of words as the input, there shouldn't be a problem to declare the grammar with POS.

import nltk
# Define the cfg grammar.
grammar = nltk.parse_cfg("""
S -> NP VP
NP -> 'DT' 'NN'
VP -> 'VB'
VP -> 'VB' 'NN'
""")


# Make your POS sentence into a list of tokens.
sentence = "DT NN VB NN".split(" ")

# Load the grammar into the ChartParser.
cp = nltk.ChartParser(grammar)

# Generate and print the nbest_parse from the grammar given the sentence tokens.
for tree in cp.nbest_parse(sentence):
    print tree

这篇关于在NLTK中使用POS标签的CFG的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆