如何在NLTK中进行依赖项解析? [英] How do I do dependency parsing in NLTK?

查看:81
本文介绍了如何在NLTK中进行依赖项解析?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在NLTK本书中,尚不清楚如何从给定的句子中生成依赖树.

Going through the NLTK book, it's not clear how to generate a dependency tree from a given sentence.

这本书的相关部分:关于依赖项的子章节语法提供了示例图形,但未显示如何解析一个句子以解决这些关系-也许我错过了NLP中的一些基本知识?

The relevant section of the book: sub-chapter on dependency grammar gives an example figure but it doesn't show how to parse a sentence to come up with those relationships - or maybe I'm missing something fundamental in NLP?

我想要类似于斯坦福解析器的功能: 给定一句我在睡眠中射杀了大象",它应该返回如下内容:

I want something similar to what the stanford parser does: Given a sentence "I shot an elephant in my sleep", it should return something like:

nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)

推荐答案

我们可以使用NLTK的Stanford Parser.

We can use Stanford Parser from NLTK.

您需要从他们的网站下载两件事:

You need to download two things from their website:

  1. Stanford CoreNLP解析器.
  2. 为您想要的语言
  3. 语言模型(例如,
  1. The Stanford CoreNLP parser.
  2. Language model for your desired language (e.g. english language model)

警告!

确保您的语言模型版本与您的Stanford CoreNLP解析器版本匹配!

Warning!

Make sure that your language model version matches your Stanford CoreNLP parser version!

截至2018年5月22日,当前的CoreNLP版本为3.9.1.

The current CoreNLP version as of May 22, 2018 is 3.9.1.

下载两个文件后,将zip文件解压缩到您喜欢的任何位置.

After downloading the two files, extract the zip file anywhere you like.

接下来,加载模型并通过NLTK使用它

Next, load the model and use it through NLTK

from nltk.parse.stanford import StanfordDependencyParser

path_to_jar = 'path_to/stanford-parser-full-2014-08-27/stanford-parser.jar'
path_to_models_jar = 'path_to/stanford-parser-full-2014-08-27/stanford-parser-3.4.1-models.jar'

dependency_parser = StanfordDependencyParser(path_to_jar=path_to_jar, path_to_models_jar=path_to_models_jar)

result = dependency_parser.raw_parse('I shot an elephant in my sleep')
dep = result.next()

list(dep.triples())

输出

最后一行的输出是:

Output

The output of the last line is:

[((u'shot', u'VBD'), u'nsubj', (u'I', u'PRP')),
 ((u'shot', u'VBD'), u'dobj', (u'elephant', u'NN')),
 ((u'elephant', u'NN'), u'det', (u'an', u'DT')),
 ((u'shot', u'VBD'), u'prep', (u'in', u'IN')),
 ((u'in', u'IN'), u'pobj', (u'sleep', u'NN')),
 ((u'sleep', u'NN'), u'poss', (u'my', u'PRP$'))]

我想这就是你想要的.

这篇关于如何在NLTK中进行依赖项解析?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆