NLTK MaltParser无法解析 [英] NLTK MaltParser won't parse

查看:79
本文介绍了NLTK MaltParser无法解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用NLTK中的MaltParser.

I am trying to use MaltParser from NLTK.

我可以说到配置解析器了:

I could get to the point of configuring the parser:

import nltk
parser = nltk.parse.malt.MaltParser()
parser.config_malt()
parser.train_from_file('malt_train.conll')

但是在实际解析时,解析器返回错误:

but when it comes to actual parsing, parser returns an error:

File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/nltk/parse/malt.py", line 98, in raw_parse
return self.parse(words, verbose)
File "/Library/Python/2.7/site-packages/nltk/parse/malt.py", line 85, in parse
return self.tagged_parse(taggedwords, verbose)
File "/Library/Python/2.7/site-packages/nltk/parse/malt.py", line 139, in tagged_parse
return DependencyGraph.load(output_file)
File "/Library/Python/2.7/site-packages/nltk/parse/dependencygraph.py", line 121, in    load
return DependencyGraph(open(file).read())
IOError: [Errno 2] No such file or directory:'/var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T/malt_output.conll'

以下是给出错误的命令(来自malt.py):

Here is the command that gives the error (from malt.py):

['java', '-jar /usr/lib/malt-1.6.1/malt.jar', '-w /var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T', '-c malt_temp', '-i /var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T/malt_input.conll', '-o /var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T/malt_output.conll', '-m parse']

我尝试运行jus java命令,这是我得到的:

I tried running jus the java command and here is what I get:

 The file entry 'malt_temp_singlemalt.info' in the mco file '/var/folders/77/ch5yxf153jl67kmqr5jqywgr0000gn/T/malt_temp.mco' cannot be loaded.  

也尝试使用预先训练的engmalt.poly.mco和engmalt.linear.mco

Also tried the same with the pre-trained engmalt.poly.mco and engmalt.linear.mco

任何建议都非常欢迎.

这是malt.py的全部功能

EDIT : Here is the full function from malt.py

def tagged_parse(self, sentence, verbose=False):
    """
    Use MaltParser to parse a sentence. Takes a sentence as a list of
    (word, tag) tuples; the sentence must have already been tokenized and
    tagged.

    @param sentence: Input sentence to parse
    @type sentence: L{list} of (word, tag) L{tuple}s.
    @return: C{DependencyGraph} the dependency graph representation of the sentence
    """

    if not self._malt_bin:
        raise Exception("MaltParser location is not configured.  Call config_malt() first.")
    if not self._trained:
        raise Exception("Parser has not been trained.  Call train() first.")

    input_file = os.path.join(tempfile.gettempdir(), 'malt_input.conll')
    output_file = os.path.join(tempfile.gettempdir(), 'malt_output.conll')

    execute_string = 'java -jar %s -w %s -c %s -i %s -o %s -m parse'
    if not verbose:
        execute_string += ' > ' + os.path.join(tempfile.gettempdir(), "malt.out")

    f = None
    try:
        f = open(input_file, 'w')

        for (i, (word,tag)) in enumerate(sentence):
            f.write('%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n' % 
                    (i+1, word, '_', tag, tag, '_', '0', 'a', '_', '_'))
        f.write('\n')
        f.close()

        cmd = ['java', '-jar %s' % self._malt_bin, '-w %s' % tempfile.gettempdir(), 
               '-c %s' % self.mco, '-i %s' % input_file, '-o %s' % output_file, '-m parse']
        print cmd

        self._execute(cmd, 'parse', verbose)

        return DependencyGraph.load(output_file)
    finally:
        if f: f.close()

推荐答案

我不确定问题是否仍未解决(但我认为问题已经解决了), 但由于前一段时间我遇到了同样的问题,我想分享我的知识.

Iam not sure if the Problem is still unsolved (but I think its already solved), but as I had the same problems a while ago, I would like to share my knowledge.

首先,MaltParser-Jar不接受.connl文件,该文件在其前面有直接指向其文件的路径.就像上面看到的. 为什么会这样...我不知道.

First of all, the MaltParser-Jar does not accept a .connl file with a direct path to its file in front of it. Like seen above. Why it is so... I do not know.

但是您可以通过将命令行更改为以下内容来轻松修复它:

But you can easily fix it by changing the command line to something like this:

            cmd = ['java', '-jar %s' % self._malt_bin,'-w %s' %self.working_dir,'-c %s' % self.mco, '-i %s' % input_file, '-o %s' % output_file, '-m parse']

现在,使用-w参数设置.conll文件的目录.使用此功能,您可以从任何给定的文件夹中加载任何.conll文件. 我还将tempfile.gettempdir()更改为self.working_dir,因为在原始" NLTK版本中,始终将/tmp/文件夹设置为工作目录.即使您使用另一个工作目录初始化Maltparser.

Here now the directory of the .conll file is set using the -w parameter. Using this you can load any .conll file from any given folder. I also change from tempfile.gettempdir() to self.working_dir, because in the "original" NLTK Version, always the /tmp/ folder is set as working directory. Even if you initialise the Maltparser with another working directory.

我希望这些信息能对某人有所帮助.

I hope this informations will help someone.

另一件事, 如果要一次解析很多句子,但又一次解析而不是依赖于所有其他句子,则必须在input.conll文件中添加一个空行,然后以1重新开始每个句子的计数.

Another thing, if you want to parse many sentences as once, but each individually and not depending on all other sentences, you have to add a blank line in the input.conll file, and start the numeration for each sentence again with 1.

这篇关于NLTK MaltParser无法解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆