Nltk stanford pos标记器错误:Java命令失败 [英] Nltk stanford pos tagger error : Java command failed

查看:153
本文介绍了Nltk stanford pos标记器错误:Java命令失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 nltk.tag.stanford module 用于标记句子(首先像Wiki的示例),但我不断收到以下错误:

I'm trying to use nltk.tag.stanford module for tagging a sentence (first like wiki's example) but i keep getting the following error :

Traceback (most recent call last):
  File "test.py", line 28, in <module>
    print st.tag(word_tokenize('What is the airspeed of an unladen swallow ?'))
  File "/usr/local/lib/python2.7/dist-packages/nltk/tag/stanford.py", line 59, in tag
    return self.tag_sents([tokens])[0]
  File "/usr/local/lib/python2.7/dist-packages/nltk/tag/stanford.py", line 81, in tag_sents
    stdout=PIPE, stderr=PIPE)
  File "/usr/local/lib/python2.7/dist-packages/nltk/internals.py", line 160, in java
    raise OSError('Java command failed!')
OSError: Java command failed!

或以下LookupError错误:

LookupError: 

===========================================================================
NLTK was unable to find the java file!
Use software specific configuration paramaters or set the JAVAHOME environment variable.
===========================================================================

这是示例代码:

>>> from nltk.tag.stanford import POSTagger
>>> st = POSTagger('/usr/share/stanford-postagger/models/english-bidirectional-distsim.tagger',
...                '/usr/share/stanford-postagger/stanford-postagger.jar') 
>>> st.tag('What is the airspeed of an unladen swallow ?'.split()) 

我还使用了word_tokenize而不是split,但这没什么区别.

I also used word_tokenize instead split but it doesn't made any difference.

我也再次安装了Java或jdk!我所有的搜索都没有成功!像nltknltk.internals.config_java()或...之类的东西!

I also installed java again or jdk! and my all searches were unsuccessful! something like nltknltk.internals.config_java() or ... !

注意:我使用linux(Xubuntu)!

Note : I use linux (Xubuntu)!

推荐答案

如果您通读

If you read through the embedded documentation in the nltk/internals.py (lines 58 - 175) you should find your answer easy enough. The NLTK requires the full path to the Java binary.

如果未指定,则nltk将在系统中搜索Java二进制文件; 如果找不到,则会引发LookupError异常.

If not specified, then nltk will search the system for a Java binary; and if one is not found, it will raise a LookupError exception.

根据一些研究,我相信您有几种选择:

You have a couple of options I believe based on a bit of research:

1)将以下代码添加到您的项目中(不是很好的解决方案)

import os
java_path = "path/to/java" # replace this
os.environ['JAVAHOME'] = java_path

2)卸载&重新安装NLTK(最好安装在 virtualenv 中)(更好,但仍然不是很好)

2) Uninstall & Reinstall NLTK (preferably in a virtualenv) (better but still not great)

pip uninstall nltk
sudo -E pip install nltk

3)设置java环境变量(这是最实用的IMO解决方案)

编辑系统路径文件/etc/profile

Edit the system Path file /etc/profile

sudo gedit /etc/profile

在末尾添加以下行

JAVA_HOME=/usr/lib/jvm/jdk1.7.0
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
export JAVA_HOME
export JRE_HOME
export PATH

这篇关于Nltk stanford pos标记器错误:Java命令失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆