在NLTK中使用POS标签的CFG [英] CFG using POS tags in NLTK
问题描述
我正在尝试使用NLTK检查给定的句子是否为语法.
I am trying to check if a given sentence is grammatical using NLTK.
例如:
OK:鲸鱼舔悲伤
OK : The whale licks the sadness
不好,我有史以来最好的
NOT OK : The best I ever had
我知道我可以进行POS标记,然后使用CFG解析器进行检查,但是我还没有找到使用POS标记而不是实际单词作为终端分支的CFG.
I know that I could do POS tagging, then use a CFG parser and check that way, but I have yet to find a CFG that uses POS tags instead of actual words as terminal branches.
有人可以推荐CFG吗?我认为自己创作是很愚蠢的,因为我不是语言学家,并且可能会遗漏重要的结构.
Is there a CFG that anyone can recommend? I think that making my own is silly, because I am not a linguist and will probably leave out important structures.
此外,我的应用程序使系统在理想情况下会拒绝许多句子,而仅批准极其确定的句子.
Also, my application is such that the system would ideally reject many sentences and only approve sentences it is extremely sure of.
谢谢:D
推荐答案
CFG的终端节点可以是任何东西,甚至可以是POS标签.只要您的词组规则可以识别POS而不是单词作为输入,那么使用POS声明语法就不会有问题.
The terminal nodes of the CFG can be anything, even POS tags. As long as your phrasal rules recognize POS instead of words as the input, there shouldn't be a problem to declare the grammar with POS.
import nltk
# Define the cfg grammar.
grammar = nltk.parse_cfg("""
S -> NP VP
NP -> 'DT' 'NN'
VP -> 'VB'
VP -> 'VB' 'NN'
""")
# Make your POS sentence into a list of tokens.
sentence = "DT NN VB NN".split(" ")
# Load the grammar into the ChartParser.
cp = nltk.ChartParser(grammar)
# Generate and print the nbest_parse from the grammar given the sentence tokens.
for tree in cp.nbest_parse(sentence):
print tree
这篇关于在NLTK中使用POS标签的CFG的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!