在 NLTK 中使用 POS 标签的 CFG [英] CFG using POS tags in NLTK

查看:25
本文介绍了在 NLTK 中使用 POS 标签的 CFG的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 NLTK 检查给定的句子是否语法.

I am trying to check if a given sentence is grammatical using NLTK.

例如:

好的:鲸鱼舔掉悲伤

不好:我曾经拥有的最好的

NOT OK : The best I ever had

我知道我可以做 POS 标记,然后使用 CFG 解析器并以这种方式检查,但我还没有找到使用 POS 标记而不是实际单词作为终端分支的 CFG.

I know that I could do POS tagging, then use a CFG parser and check that way, but I have yet to find a CFG that uses POS tags instead of actual words as terminal branches.

有没有人可以推荐的CFG?我认为自己制作是愚蠢的,因为我不是语言学家,可能会遗漏重要的结构.

Is there a CFG that anyone can recommend? I think that making my own is silly, because I am not a linguist and will probably leave out important structures.

另外,我的申请是这样的,系统理想情况下会拒绝很多句子,只批准它非常确定的句子.

Also, my application is such that the system would ideally reject many sentences and only approve sentences it is extremely sure of.

谢谢:D

推荐答案

CFG 的终端节点可以是任何东西,甚至 POS 标签.只要您的短语规则将 POS 而不是单词识别为输入,那么用 POS 声明语法应该没有问题.

The terminal nodes of the CFG can be anything, even POS tags. As long as your phrasal rules recognize POS instead of words as the input, there shouldn't be a problem to declare the grammar with POS.

import nltk
# Define the cfg grammar.
grammar = nltk.parse_cfg("""
S -> NP VP
NP -> 'DT' 'NN'
VP -> 'VB'
VP -> 'VB' 'NN'
""")


# Make your POS sentence into a list of tokens.
sentence = "DT NN VB NN".split(" ")

# Load the grammar into the ChartParser.
cp = nltk.ChartParser(grammar)

# Generate and print the nbest_parse from the grammar given the sentence tokens.
for tree in cp.nbest_parse(sentence):
    print tree

这篇关于在 NLTK 中使用 POS 标签的 CFG的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆