如何使用NLTK从归纳语法生成句子? [英] How to use NLTK to generate sentences from an induced grammar?

查看:229
本文介绍了如何使用NLTK从归纳语法生成句子?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个(大型)已解析句子列表(使用斯坦福解析器进行解析),例如,句子"Now you can be enterent"具有以下树:

I have a (large) list of parsed sentences (which were parsed using the Stanford parser), for example, the sentence "Now you can be entertained" has the following tree:

(ROOT
  (S
    (ADVP (RB Now))
    (, ,)
    (NP (PRP you))
    (VP (MD can)
      (VP (VB be)
        (VP (VBN entertained))))
    (. .)))

我正在使用句子树集来使用nltk进行语法:

I am using the set of sentence trees to induce a grammar using nltk:

import nltk

# ... for each sentence tree t, add its production to allProductions
allProductions += t.productions()

# Induce the grammar
S = nltk.Nonterminal('S')
grammar = nltk.induce_pcfg(S, allProductions)

现在,我想使用grammar生成新的随机句子.我的希望是,由于语法是从一组特定的输入示例中学习的,因此生成的句子在语义上将是相似的.我可以在nltk中这样做吗?

Now I would like to use grammar to generate new, random sentences. My hope is that since the grammar was learned from a specific set of input examples, then the generated sentences will be semantically similar. Can I do this in nltk?

如果我不能使用nltk进行此操作,是否有其他工具可以使用(可能是重新格式化的)grammar并生成句子?

If I can't use nltk to do this, do any other tools exist that can take the (possibly reformatted) grammar and generate sentences?

推荐答案

在NLTK 2.0中,您可以使用nltk.parse.generate生成 all 可能的

In NLTK 2.0 you can use nltk.parse.generate to generate all possible sentences for a given grammar.

此代码定义了一个函数,该函数应根据(P)CFG中的生产规则生成一个句子.

This code defines a function which should generate a single sentence based on the production rules in a (P)CFG.

# This example uses choice to choose from possible expansions
from random import choice
# This function is based on _generate_all() in nltk.parse.generate
# It therefore assumes the same import environment otherwise.
def generate_sample(grammar, items=["S"]):
    frags = []
    if len(items) == 1:
        if isinstance(items[0], Nonterminal):
            for prod in grammar.productions(lhs=items[0]):
                frags.append(generate_sample(grammar, prod.rhs()))
        else:
            frags.append(items[0])
    else:
        # This is where we need to make our changes
        chosen_expansion = choice(items)
        frags.append(generate_sample,chosen_expansion)
    return frags

要利用PCFG中的权重,显然您会想使用一种比choice()更好的采样方法,后者暗中假设当前节点的所有展开都是等概率的.

To make use of the weights in your PCFG, you'll obviously want to use a better sampling method than choice(), which implicitly assumes all expansions of the current node are equiprobable.

这篇关于如何使用NLTK从归纳语法生成句子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆