斯坦福解析器多线程使用 [英] Stanford Parser multithread usage

查看:21
本文介绍了斯坦福解析器多线程使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

2.0 版起,Stanford Parser 现在是线程安全的"> (02.03.2012).我目前正在运行命令行工具,但无法弄清楚如何通过线程化程序来利用我的多核.

Stanford Parser is now 'thread-safe' as of version 2.0 (02.03.2012). I am currently running the command line tools and cannot figure out how to make use of my multiple cores by threading the program.

过去,这个问题的回答是Stanford Parser 不是线程安全的",正如常见问题解答中所说的那样.我希望能找到一个成功线程化最新版本的人.

In the past, this question has been answered with "Stanford Parser is not thread-safe", as the FAQ still says. I am hoping to find someone who has had success threading the latest version.

我曾尝试使用 -t 标志(-t10 和 -tLLP),因为这是我在搜索中所能找到的全部,但两者都抛出错误.

I have tried using -t flag (-t10 and -tLLP) since that was all I could find in my searches, but both throw errors.

我发出的命令示例是:

java -cp stanford-parser.jar edu.stanford.nlp.parser.lexparser.LexicalizedParser 
-outputFormat "oneline" ./grammar/englishPCFG.ser.gz ./corpus > corpus.lex

推荐答案

从 2.0.5 版开始,您现在可以通过选项 -nthreads k 轻松使用多线程.例如,您的命令可以是这样的:

Starting with version 2.0.5, you can now easily use multiple threads with the option -nthreads k. For example, your command can be like this:

java -mx6g edu.stanford.nlp.parser.lexparser.LexicalizedParser -nthreads 4 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz file.txt > file.stp

(2013 年之前的版本 2 无法从命令行启用多线程,但只能在使用 API 时启用.)

(Releases of version 2 prior to 2013 had no way to enable multithreading from the command-line, but only when using the API.)

在内部,您可以根据需要在一个 JVM 进程中同时运行任意数量的解析线程.您可以通过获取和使用多个 LexicalizedParserQuery 对象(通过 parserQuery() 方法)或通过调用 apply(...)parseTree(...) 关闭一个 LexicalizedParser.-nthreads k 选项通过使用 Executor 框架将连续的句子发送到不同的解析器来为您做到这一点.您还可以同时创建多个 LexicalizedParser,例如,用于解析不同的语言.

Internally, you can simultaneously run as many parsing threads inside one JVM process as you want. You can do this either by getting and using multiple LexicalizedParserQuery objects (via the parserQuery() method) or implicitly by calling apply(...) or parseTree(...) off one LexicalizedParser. The -nthreads k option does this for you by sending successive sentences to different parsers using the Executor framework. You can also simultaneously create multiple LexicalizedParser's, e.g., for parsing different languages.

多个 LexicalizedparserQuery 对象共享相同的语法 (LexicalizedParser),但节省的内存空间并不大,因为大部分内存都用于图表解析中使用的瞬态结构.因此,如果您同时运行大量解析线程,则需要为 JVM 提供大量内存,如上例所示.

Multiple LexicalizedparserQuery objects share the same grammar (LexicalizedParser), but the memory space savings aren't huge, as most of the memory goes to the transient structures used in chart parsing. So, if you are running lots of parsing threads concurrently, you will need to give a lot of memory to the JVM, as in the example above.

附言抱歉,是的,有些文档仍然需要更新.但是 -tLPP 是用于指定特定于语言的资源的标志之一.斯坦福解析器没有 -t 标志.

p.s. Sorry, yes, some of the documentation still needs updating. But -tLPP is one flag for specifying language-specific resources. The Stanford Parser has no -t flag.

这篇关于斯坦福解析器多线程使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆