Stanford NLP 3.9.0:使用CoreEntityMention是否合并相邻实体的提及? [英] Stanford NLP 3.9.0: Does using CoreEntityMention combine adjacent entity mentions?

查看:114
本文介绍了Stanford NLP 3.9.0:使用CoreEntityMention是否合并相邻实体的提及?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在测试,让实体使用CoreEntityMention提及新的3.9.0方法.我做类似的事情:

I am testing out getting entity mentions the new 3.9.0 way with CoreEntityMention. I do something like:

    CoreDocument document = new CoreDocument(text);
    stanfordPipe = createNerPipeline();
    stanfordPipe.annotate(document);

    for (CoreSentence sentence : document.sentences()) {
        logger.debug("Found sentence {}", sentence);
        if (sentence.entityMentions() == null) continue;
        for (CoreEntityMention cem : sentence.entityMentions()) {
            logger.debug("Found em {}", stringify(cem));            
        }
    }

当我使用sentence.entityMentions()遍历实体提及时,我看到生成的某些实体提及是多令牌实体提及.获取实体提及并纠正我的错误的旧方法是,您必须遍历CoreLabel,因此必须结合自己的多令牌实体提及.

When I iterate through entity mentions using sentence.entityMentions() I see that some of the entity mentions produced are multi-token entity mentions. The old way of getting entity mentions, and correct me if I am wrong, is that you have to iterate over CoreLabel and therefore have to combine the multi-token entity mentions yourself.

那么,在结合具有相同ner标签的相邻标记之前,是否存在一些不存在的新方法?还是我错过了结合多令牌实体提及的旧方法?

So is there some new method that did not exist before to combine adjacent tokens with the same ner label? Or have I missed older ways to combine multi-token entity mentions?

推荐答案

您好!感谢您使用新界面!

Hi thanks for using the new interface!

是的,CoreEntityMention应该代表完整的实体说明.这是添加的一些新语法,有助于简化我们的代码.

Yes, the CoreEntityMention is supposed to represent a full entity mention. This was some new syntax added to help make it easier to work with our code.

传统上,需要诸如句子.get(CoreAnnotations.TokensAnnotation.class)...之类的东西,因此我们尝试添加一些包装器类,以便人们可以使用管道接口,但语法不繁琐

Traditionally there has been a need for things like sentence.get(CoreAnnotations.TokensAnnotation.class)...etc...so we tried to add some wrapper classes so people could use the pipeline interface but not have the cumbersome syntax.

使用这种新近推出的语法,您可以编写:

With this newly debuted syntax, you can write:

sentence.tokens();

关于实体提及,如果句子是乔·史密斯去夏威夷".您会得到两个实体提及:

Regarding entity mentions, if the sentence is "Joe Smith went to Hawaii." you would get two entity mentions:

Joe Smith(2个令牌) 夏威夷(1个令牌)

Joe Smith (2 tokens) Hawaii (1 token)

传统上,ner注释器会使用其命名的实体类型标记句子中的每个标记.然后,单独的entitymentions注释器将构建Mention注释,这些注释是完整实体提及的CoreMap表示形式(例如Joe Smith).

Traditionally the ner annotator would tag every token in the sentence with it's named entity type. Then a separate entitymentions annotator would build Mention annotations which were CoreMap representations of full entity mentions (e.g. Joe Smith).

多年来,我已经看到很多人问:我如何从标记的标记序列转到完整的实体提及?"因此,针对此问题,我们试图使提取句子中提到的完整实体变得容易得多.

I've seen a lot of people over the years ask "How do I go from a tagged sequence of tokens to the full entity mentions?" So in response to this we tried to make it a lot easier to just extract the full entity's referred to in the sentence.

我还应该指出,在大多数情况下,较旧的方法仍然可以使用.我们将在完成3.9.0版本的发布过程中更新文档!

I should also note that for the most part the older ways should still work. Updated documentation is on the way as we work on finalizing the 3.9.0 release!

这篇关于Stanford NLP 3.9.0:使用CoreEntityMention是否合并相邻实体的提及?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆