如何使用 Python 在 NLTK 中使用斯坦福解析器 [英] How to use Stanford Parser in NLTK using Python
问题描述
是否可以在 NLTK 中使用斯坦福解析器?(我不是在谈论斯坦福 POS.)
Is it possible to use Stanford Parser in NLTK? (I am not talking about Stanford POS.)
推荐答案
请注意,此答案适用于 NLTK v 3.0,而不适用于更新的版本.
Note that this answer applies to NLTK v 3.0, and not to more recent versions.
当然,请在 Python 中尝试以下操作:
Sure, try the following in Python:
import os
from nltk.parse import stanford
os.environ['STANFORD_PARSER'] = '/path/to/standford/jars'
os.environ['STANFORD_MODELS'] = '/path/to/standford/jars'
parser = stanford.StanfordParser(model_path="/location/of/the/englishPCFG.ser.gz")
sentences = parser.raw_parse_sents(("Hello, My name is Melroy.", "What is your name?"))
print sentences
# GUI
for line in sentences:
for sentence in line:
sentence.draw()
输出:
[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]),Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN',['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ',['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ',[Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ',['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]),Tree('.', ['?'])])])]
[Tree('ROOT', [Tree('S', [Tree('INTJ', [Tree('UH', ['Hello'])]), Tree(',', [',']), Tree('NP', [Tree('PRP$', ['My']), Tree('NN', ['name'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('ADJP', [Tree('JJ', ['Melroy'])])]), Tree('.', ['.'])])]), Tree('ROOT', [Tree('SBARQ', [Tree('WHNP', [Tree('WP', ['What'])]), Tree('SQ', [Tree('VBZ', ['is']), Tree('NP', [Tree('PRP$', ['your']), Tree('NN', ['name'])])]), Tree('.', ['?'])])])]
注意 1:在这个例子中,解析器和模型罐位于同一文件夹中.
Note 1: In this example both the parser & model jars are in the same folder.
注意 2:
- stanford 解析器的文件名为:stanford-parser.jar
- 斯坦福模型的文件名是:stanford-parser-x.x.x-models.jar
注意 3:englishPCFG.ser.gz 文件可以在里面models.jar 文件 (/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz) 中找到.请使用存档管理器来解压缩"models.jar 文件.
Note 3: The englishPCFG.ser.gz file can be found inside the models.jar file (/edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz). Please use come archive manager to 'unzip' the models.jar file.
注意 4:确保您使用的是 Java JRE(运行时环境)1.8,也称为 Oracle JDK 8.否则您将获得:不支持的主要版本.次要版本 52.0.
Note 4: Be sure you are using Java JRE (Runtime Environment) 1.8 also known as Oracle JDK 8. Otherwise you will get: Unsupported major.minor version 52.0.
从以下位置下载 NLTK v3:https://github.com/nltk/nltk.并安装 NLTK:
sudo python setup.py install
sudo python setup.py install
您可以使用 NLTK 下载器获取斯坦福解析器,使用 Python:
You can use the NLTK downloader to get Stanford Parser, using Python:
import nltk
nltk.download()
试试我的例子!(不要忘记更改jar路径并将模型路径更改为ser.gz位置)
Try my example! (don't forget the change the jar paths and change the model path to the ser.gz location)
或:
下载并安装 NLTK v3,同上.
Download and install NLTK v3, same as above.
从(当前版本文件名是 stanford-parser-full-2015-01-29.zip)下载最新版本:http://nlp.stanford.edu/software/lex-parser.shtml#Download
Download the latest version from (current version filename is stanford-parser-full-2015-01-29.zip): http://nlp.stanford.edu/software/lex-parser.shtml#Download
解压standford-parser-full-20xx-xx-xx.zip.
Extract the standford-parser-full-20xx-xx-xx.zip.
创建一个新文件夹(在我的示例中为jars").将解压出来的文件放入这个 jar 文件夹:stanford-parser-3.x.x-models.jar 和 stanford-parser.jar.
Create a new folder ('jars' in my example). Place the extracted files into this jar folder: stanford-parser-3.x.x-models.jar and stanford-parser.jar.
如上所示,您可以使用环境变量 (STANFORD_PARSER & STANFORD_MODELS) 来指向这个 'jars' 文件夹.我使用的是 Linux,所以如果你使用 Windows,请使用类似:C://folder//jars.
As shown above you can use the environment variables (STANFORD_PARSER & STANFORD_MODELS) to point to this 'jars' folder. I'm using Linux, so if you use Windows please use something like: C://folder//jars.
使用存档管理器 (7zip) 打开 stanford-parser-3.x.x-models.jar.
Open the stanford-parser-3.x.x-models.jar using an Archive manager (7zip).
浏览jar文件;edu/stanford/nlp/models/lexparser.再次提取名为englishPCFG.ser.gz"的文件.记住您提取此 ser.gz 文件的位置.
Browse inside the jar file; edu/stanford/nlp/models/lexparser. Again, extract the file called 'englishPCFG.ser.gz'. Remember the location where you extract this ser.gz file.
创建 StanfordParser 实例时,您可以提供模型路径作为参数.这是模型的完整路径,在我们的例子中是/location/of/englishPCFG.ser.gz.
When creating a StanfordParser instance, you can provide the model path as parameter. This is the complete path to the model, in our case /location/of/englishPCFG.ser.gz.
试试我的例子!(不要忘记更改jar路径并将模型路径更改为ser.gz位置)
Try my example! (don't forget the change the jar paths and change the model path to the ser.gz location)
这篇关于如何使用 Python 在 NLTK 中使用斯坦福解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!