NLTK上下文自由语法生成 [英] NLTK Context Free Grammar Genaration

查看:93
本文介绍了NLTK上下文自由语法生成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用带有Unicode字符的非英语解析器.为此,我决定使用NLTK.

I'm working on a non-English parser with Unicode characters. For that, I decided to use NLTK.

但是它需要一个预定义的无上下文语法,如下所示:

But it requires a predefined context-free grammar as below:

  S -> NP VP
  VP -> V NP | V NP PP
  PP -> P NP
  V -> "saw" | "ate" | "walked"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  P -> "in" | "on" | "by" | "with" 

在我的应用中,我应该使用基于规则的语法来最大程度地减少硬编码. 例如,我可以假设任何以 -ed -ing 结尾的单词作为动词.因此,它应该适用于任何给定的上下文.

In my app, I am supposed to minimize hard coding with the use of a rule-based grammar. For example, I can assume any word ending with -ed or -ing as a verb. So it should work for any given context.

如何将这样的语法规则提供给NLTK?还是使用有限状态机动态生成它们?

How can I feed such grammar rules to NLTK? Or generate them dynamically using Finite State Machine?

推荐答案

也许您正在寻找CFG.fromstring()(以前是parse_cfg())?

Maybe you're looking for CFG.fromstring() (formerly parse_cfg())?

摘自第7章 NLTK书(已更新为NLTK 3.0):

From Chapter 7 of the NLTK book (updated to NLTK 3.0):

> grammar = nltk.CFG.fromstring("""
 S -> NP VP
 VP -> V NP | V NP PP
 V -> "saw" | "ate"
 NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
 Det -> "a" | "an" | "the" | "my"
 N -> "dog" | "cat" | "cookie" | "park"
 PP -> P NP
 P -> "in" | "on" | "by" | "with"
 """)

> sent = "Mary saw Bob".split()
> rd_parser = nltk.RecursiveDescentParser(grammar)
> for p in rd_parser.parse(sent):
      print p
(S (NP Mary) (VP (V saw) (NP Bob)))

这篇关于NLTK上下文自由语法生成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆