为什么Stanford CoreNLP NER注释器默认加载3个模型? [英] Why does Stanford CoreNLP NER-annotator load 3 models by default?

查看:173
本文介绍了为什么Stanford CoreNLP NER注释器默认加载3个模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我在StanfordCoreNLP对象管道中添加"ner"注释器时,我可以看到它加载了3个模型,这需要很多时间:

When I add the "ner" annotator to my StanfordCoreNLP object pipeline, I can see that it loads 3 models, which takes a lot of time:

Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [10.3 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [10.1 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [6.5 sec].
Initializing JollyDayHoliday for SUTime from classpath: edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/defs.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.sutime.txt
Reading TokensRegex rules from edu/stanford/nlp/models/sutime/english.holidays.sutime.txt

是否有一种方法可以仅加载可以正常工作的子集?特别是,我不确定为什么当它具有7级模型时为什么要加载3级和4级NER模型,并且我想知道是否不加载这两种仍然可以工作.

Is there a way to just load a subset that will work equally? Particularly, I am unsure why it is loading the 3-class and 4-class NER models when it has the 7-class model, and I'm wondering if not loading these two will still work.

推荐答案

您可以设置以这种方式加载哪些模型:

You can set which models are loaded in this manner:

命令行:

-ner.model model_path1,model_path2

Java代码:

 props.put("ner.model", "model_path1,model_path2");

其中model_path1和model_path2应该类似于:

Where model_path1 and model_path2 should be something like:

"edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz"

模型是分层应用的.运行第一个模型并应用其标签.然后是第二个,第三个,依此类推.如果需要较少的模型,可以在列表中放入1或2个模型,而不是三个默认模型,但这会改变系统的性能.

The models are applied in layers. The first model is run and its tags applied. Then the second, the third, and so on. If you want less models, you can put 1 or 2 models in the list instead of the three default, but this will change how the system performs.

如果将"ner.combinationMode"设置为"HIGH_RECALL",则将允许所有模型应用其所有标签.如果将"ner.combinationMode"设置为"NORMAL",则将来的模型将无法应用以前模型设置的任何标签.

If you set "ner.combinationMode" to "HIGH_RECALL", all models will be allowed to apply all of their tags. If you set "ner.combinationMode" to "NORMAL", then a future model cannot apply any tags set by previous models.

默认情况下,所有三个模型都针对不同的数据进行了训练.例如,与7级模型相比,使用3级模型训练的数据要多得多.因此,每个模型都在做不同的事情,它们的结果都被组合在一起以创建最终的标签序列.

All three models in the default were trained on different data. For instance, the 3-class was trained with substantially more data than the 7-class model. So each model is doing something different and their results are all being combined to create the final tag sequence.

这篇关于为什么Stanford CoreNLP NER注释器默认加载3个模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆