OpenNLP与斯坦福CoreNLP [英] OpenNLP vs Stanford CoreNLP

查看:511
本文介绍了OpenNLP与斯坦福CoreNLP的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在对这两个包进行一些比较,我不确定要进入哪个方向。我想简要介绍的是:

I've been doing a little comparison of these two packages and am not sure which direction to go in. What I am looking for briefly is:


  1. 指定实体认可(人员,地点,组织等)。

  2. 性别识别。

  3. 体面的培训API。

据我所知,OpenNLP和Stanford CoreNLP提供了非常相似的功能。然而,Stanford CoreNLP看起来它有更多的活动,而OpenNLP在过去的六个月里只有一些提交。

From what I can tell, OpenNLP and Stanford CoreNLP expose pretty similar capabilities. However, Stanford CoreNLP looks like it has a lot more activity whereas OpenNLP has only had a few commits in the last six months.

根据我所看到的,OpenNLP似乎更容易培养新模型,仅凭这个原因可能更具吸引力。但是,我的问题是其他人开始将其作为将Java功能添加到Java应用程序的基础?我最担心的是OpenNLP是否刚刚成熟而不是半被遗弃。

Based on what I saw, OpenNLP appears to be easier to train new models and might be more attractive for that reason alone. However, my question is what would others start with as the basis for adding NLP features to a Java app? I'm mostly worried as to whether OpenNLP is "just mature" versus semi-abandoned.

推荐答案

在完全披露中,我'是CoreNLP的贡献者,所以这是一个有偏见的答案。但是,在我看来你的三个标准:

In full disclosure, I'm a contributor to CoreNLP, so this is a biased answer. But, in my view on your three criteria:


  1. 命名实体识别:我认为CoreNLP在准确性方面明显胜出便于使用。例如,OpenNLP每个NER标签都有一个模型,而CoreNLP使用一个Annotator检测所有标签。此外,使用SUTime的时间分辨率是CoreNLP中的一个很好的特权。准确性方面,我的轶事经验是CoreNLP在通用文本方面做得更好。

  1. Named Entity Recognition: I think CoreNLP clearly wins here, both on accuracy and ease-of-use. For one, OpenNLP has a model per NER tag, whereas CoreNLP detects all tags with a single Annotator. Furthermore, temporal resolution with SUTime is a nice perk in CoreNLP. Accuracy-wise, my anecdotal experience is that CoreNLP does better on general-purpose text.

性别识别。我认为这两种工具在这方面都很难记录。 OpenNLP似乎有一个GenderModel类; CoreNLP有一个性别注释器。

Gender identification. I think both tools are kind of poorly documented on this front. OpenNLP seems to have a GenderModel class; CoreNLP has a gender Annotator.

Training API。我怀疑OpenNLP培训API更易于使用而不是现成的培训。但是,如果你想做的只是,例如,从CoNLL文件中训练模型,两者都应该是直截了当的。 CoreNLP的培训速度往往比我尝试过的其他工具更快,但我还没有正式对其进行基准测试,所以请稍等一些。

Training API. I suspect the OpenNLP training API is easier-to-use for not off-the-shelf training. But, if all you want to do is, e.g., train a model from a CoNLL file, both should be straightforward. Training speed tends to be faster with CoreNLP than other tools I've tried, but I haven't benchmarked it formally, so take that with a grain of salt.

这篇关于OpenNLP与斯坦福CoreNLP的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆