如何在python nltk中使用麦芽解析器 [英] How to use malt parser in python nltk

查看:109
本文介绍了如何在python nltk中使用麦芽解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

作为我的学术项目的一部分,我需要将一堆任意句子解析为一个依赖关系图.经过大量搜索后,我得到了可以使用Malt Parser与其预先训练好的语法来解析文本的解决方案.

As a part of my academic project I need to parse a bunch of arbitrary sentences into a dependency graph. After a searching a lot I got the solution that I can use Malt Parser for parsing text with its pre trained grammer.

我已经从 http:/下载了预训练模型(engmalt.linear-1.7.mco) /www.maltparser.org/mco/mco.html .但是我不知道如何使用此语法文件和麦芽解析器(通过麦芽的python接口)解析我的句子.我已经下载了最新版本的麦芽解析器(1.7.2)并将其移至'/usr/lib/'

I have downloaded pre-trained model (engmalt.linear-1.7.mco) from http://www.maltparser.org/mco/mco.html. BUt I don't know how to parse my sentences using this grammer file and malt parser (by the python interface for malt). I have downloaded latest version of malt parser (1.7.2) and moved it to '/usr/lib/'

import nltk; 
parser =nltk.parse.malt.MaltParser()
txt="This is a test sentence"
parser.train_from_file('/home/rohith/malt-1.7.2/engmalt.linear-1.7.mco')
parser.raw_parse(txt)

执行最后一行后,将显示以下错误消息

after executing the last line the following eror message is dispalyed

Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
parser.raw_parse(txt)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/malt.py", line 88, in raw_parse
return self.parse(words, verbose)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/malt.py", line 75, in parse
return self.tagged_parse(taggedwords, verbose)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/malt.py", line 122, in tagged_parse
return DependencyGraph.load(output_file)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/dependencygraph.py", line 121, in load
return DependencyGraph(open(file).read())
IOError: [Errno 2] No such file or directory: '/tmp/malt_output.conll'

请帮助我使用此麦芽解析器解析该句子.

Please help me to parse that sentence using this malt parser.

推荐答案

已编辑

请注意,由于自2015年8月以来NLTK中MaltParser API的更新版本,因此该答案不再起作用.此答案出于遗留原因而保留.

Edited

Note that is answer is no longer working because of the updated version of the MaltParser API in NLTK since August 2015. This answer is kept for legacy sake.

请查看以下答案以使MaltParser与NLTK一起使用:

Please see this answers to get MaltParser working with NLTK:

免责声明:这不是永恒的解决方案.上面链接中的答案(发布于2016年2月)目前适用.但是,当MaltParser或NLTK API更改时,它也可能会将语法更改为在NLTK中使用MaltParser.

Disclaimer: This is not an eternal solutions. The answer in the above link (posted on Feb 2016) will work for now. But when MaltParser or NLTK API changes, it might also change the syntax to using MaltParser in NLTK.

您的设置有几个问题:

  • train_from_file的输入必须是CoNLL格式的文件,而不是预先训练的模型.对于mco文件,您可以使用mcoworking_directory参数将其传递给MaltParser构造函数.
  • 默认的Java堆分配不足以加载该特定的mco文件,因此您必须通过-Xmx参数告诉Java使用更多的堆空间.不幸的是,现有代码无法做到这一点,因此我只是进行了更改,以允许为java args使用其他构造函数参数.在此处.
  • The input to train_from_file must be a file in CoNLL format, not a pre-trained model. For an mco file, you pass it to the MaltParser constructor using the mco and working_directory parameters.
  • The default java heap allocation is not large enough to load that particular mco file, so you'll have to tell java to use more heap space with the -Xmx parameter. Unfortunately this wasn't possible with the existing code so I just checked in a change to allow an additional constructor parameters for java args. See here.

这就是您需要做的:

首先,获取最新的NLTK版本:

First, get the latest NLTK revision:

git clone https://github.com/nltk/nltk.git

(注意:如果您不能使用NLTK的git版本,则必须手动更新文件malt.py或从

(NOTE: If you can't use the git version of NLTK, then you'll have to update the file malt.py manually or copy it from here to have your own version.)

第二,将jar文件重命名为malt.jar,这是NLTK期望的:

Second, rename the jar file to malt.jar, which is what NLTK expects:

cd /usr/lib/
ln -s maltparser-1.7.2.jar malt.jar

然后添加一个指向麦芽解析器的环境变量:

Then add an environment variable pointing to malt parser:

export MALTPARSERHOME="/Users/dhg/Downloads/maltparser-1.7.2"

最后,在python中加载并使用麦芽解析器:

Finally, load and use malt parser in python:

>>> import nltk
>>> parser = nltk.parse.malt.MaltParser(working_dir="/home/rohith/malt-1.7.2", 
...                                     mco="engmalt.linear-1.7", 
...                                     additional_java_args=['-Xmx512m'])
>>> txt = "This is a test sentence"
>>> graph = parser.raw_parse(txt)
>>> graph.tree().pprint()
'(This (sentence is a test))'

这篇关于如何在python nltk中使用麦芽解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆