无法使Stanford POS标记器在nltk中工作 [英] Can't make Stanford POS tagger working in nltk

查看:113
本文介绍了无法使Stanford POS标记器在nltk中工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用NLTK中的Stanford POS标记器.我正在使用此处显示的示例:

I'm trying to work with Stanford POS tagger within NLTK. I'm using the example shown here:

http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford

我能够顺利加载所有内容:

I'm able to load everything smoothly:

>>> import os
>>> from nltk.tag import StanfordPOSTagger
>>> os.environ['STANFORD_MODELS'] = '/path/to/stanford/folder/models')

>>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger',path_to_jar='/path/to/stanford/folder/stanford-postagger.jar')

但在第一次执行时:

>>> st.tag('What is the airspeed of an unladen swallow ?'.split())

它给了我以下错误:

Loading default properties from tagger /path/to/stanford/folder/models/english-bidirectional-distsim.tagger
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
    at edu.stanford.nlp.io.IOUtils.<clinit>(IOUtils.java:41)
    at edu.stanford.nlp.tagger.maxent.TaggerConfig.<init>(TaggerConfig.java:146)
    at edu.stanford.nlp.tagger.maxent.TaggerConfig.<init>(TaggerConfig.java:128)
    at edu.stanford.nlp.tagger.maxent.MaxentTagger.main(MaxentTagger.java:1836)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 4 more

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/miguelwon/anaconda/lib/python2.7/site-packages/nltk/tag/stanford.py", line 66, in tag
    return sum(self.tag_sents([tokens]), []) 
  File "/Users/miguelwon/anaconda/lib/python2.7/site-packages/nltk/tag/stanford.py", line 89, in tag_sents
    stdout=PIPE, stderr=PIPE)
  File "/Users/miguelwon/anaconda/lib/python2.7/site-packages/nltk/internals.py", line 134, in java
    raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed : [u'/usr/bin/java', '-mx1000m', '-cp', '/path/to/stanford/folder/stanford-postagger-full-2015-12-09/stanford-postagger.jar', 'edu.stanford.nlp.tagger.maxent.MaxentTagger', '-model', '/Users/miguelwon/Documents/Kaggel/RTE/stanford-postagger-full-2015-12-09/models/english-bidirectional-distsim.tagger', '-textFile', '/var/folders/vb/dy__dnps7qz35slpmfkc25g40000gn/T/tmpwieb0M', '-tokenize', 'false', '-outputFormatOptions', 'keepEmptySentences', '-encoding', 'utf8']

推荐答案

自该解决方案以来,批次已更改.在我也遇到错误之后,这是我的代码解决方案.基本上增加了JAVA堆大小即可解决此问题.

Lot has changed since this solution.Here is my solution to the code,after I too faced the error.Basically increasing JAVA heapsize solved it.

import os
java_path = "C:\\Program Files\\Java\\jdk1.8.0_102\\bin\\java.exe"
os.environ['JAVAHOME'] = java_path

from nltk.tag.stanford import StanfordPOSTagger
path_to_model = "stanford-postagger-2015-12-09/models/english-bidirectional-distsim.tagger"
path_to_jar = "stanford-postagger-2015-12-09/stanford-postagger.jar"
tagger=StanfordPOSTagger(path_to_model, path_to_jar)
tagger.java_options='-mx4096m'          ### Setting higher memory limit for long sentences
sentence = 'This is testing'
print tagger.tag(sentence.split())

这篇关于无法使Stanford POS标记器在nltk中工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆