如何从Spacy NER模型获得每个实体的预测概率? [英] How to get probability of prediction per entity from Spacy NER model?

查看:127
本文介绍了如何从Spacy NER模型获得每个实体的预测概率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将此官方示例代码用于使用我自己的训练样本从头开始训练NER模型。



当我预测在新文本上使用此模型时,我想获得每个实体的预测概率。



< blockquote>

 #测试保存的模型
print( Loading from,output_dir)
nlp2 = spacy.load(output_dir)
用于文本,TRAIN_DATA中的_:
doc = nlp2(text)
print( Entities,[(ent.text,ent.label_)用于doc.ents中的ent])
print(令牌,[(文档中t的((t.text,t.ent_type_,t.ent_iob)]])


我无法在Spacy中找到方法来预测每个实体的概率。



怎么办我从Spacy得到这种可能性了吗?我需要它对它应用限制。

解决方案

从Spacy NER模型获得每个实体的预测概率并非易事。
这是从此处:

  
来自集合的导入空间
import defaultdict

texts = ['John在Microsoft工作。']

#要考虑的替代分析数。越多越慢,但不一定越好-您需要对问题进行试验。
beam_width = 16
#这会在每个步骤中剪辑解决方案。我们将排名最高的操作的得分乘以该值,并将结果用作阈值。这样可以防止解析器探索看起来不太可能的选项,从而节省了一些效率。由于我们对贪婪的目标进行了训练,因此准确性也可能会提高。
beam_density = 0.0001
nlp = spacy.load('en_core_web_md')


docs = list(nlp.pipe(texts,disable = ['ner']] ))
beams = nlp.entity.beam_parse(docs,beam_width = beam_width,beam_density = beam_density)

for doc,zip in zip(docs,beams):
entity_scores = defaultdict(float)
用于得分,nlp.entity.moves.get_beam_parses(beam)中的条目:
用于开始,结束,标签中的ents:
entity_scores [(开始,结束,标签) ] + =得分

l = []
for k,v在entity_scores.items()中:
l.append({'start':k [0],'end ':k [1],'label':k [2],'prob':v})

for a in sorted(l,key = lambda x:x ['start']) :
print(a)

###输出:####

{'开始':0,'结束':1,'标签' :'PERSON','prob':0.4054479906820232}
{'start':0,'end':1,'label':'ORG','prob':0.01002015005487447}
{'start' :0,'end':1,'label':'PRODUCT','prob':0.0008592912552754791}
{'start':0,'end':1,'label':'WORK_OF _ART','prob':0.0007666755792166002}
{'start':0,'end':1,'label':'NORP','prob':0.00034931990870877333}
{'start':0 ,'end':1,'label':'TIME','prob':0.0002786051849320804}
{'start':3,'end':4,'label':'ORG','prob': 0.9990115861687987}
{'开始':3,'结束':4,'标签':'PRODUCT','问题':0.0003378157477046507}
{'开始':3,'结束':4, 'label':'FAC','prob':8.249734411749544e-05}


I used this official example code to train a NER model from scratch using my own training samples.

When I predict using this model on new text, I want to get the probability of prediction of each entity.

    # test the saved model
    print("Loading from", output_dir)
    nlp2 = spacy.load(output_dir)
    for text, _ in TRAIN_DATA:
        doc = nlp2(text)
        print("Entities", [(ent.text, ent.label_) for ent in doc.ents])
        print("Tokens", [(t.text, t.ent_type_, t.ent_iob) for t in doc])

I am unable to find a method in Spacy to get the probability of prediction of each entity.

How do I get this probability from Spacy? I need it to apply a cutoff on it.

解决方案

Getting the probabilities of prediction per entity from a Spacy NER model is not trivial. Here is the solution adapted from here :


import spacy
from collections import defaultdict

texts = ['John works at Microsoft.']

# Number of alternate analyses to consider. More is slower, and not necessarily better -- you need to experiment on your problem.
beam_width = 16
# This clips solutions at each step. We multiply the score of the top-ranked action by this value, and use the result as a threshold. This prevents the parser from exploring options that look very unlikely, saving a bit of efficiency. Accuracy may also improve, because we've trained on greedy objective.
beam_density = 0.0001 
nlp = spacy.load('en_core_web_md')


docs = list(nlp.pipe(texts, disable=['ner']))
beams = nlp.entity.beam_parse(docs, beam_width=beam_width, beam_density=beam_density)

for doc, beam in zip(docs, beams):
    entity_scores = defaultdict(float)
    for score, ents in nlp.entity.moves.get_beam_parses(beam):
        for start, end, label in ents:
            entity_scores[(start, end, label)] += score

l= []
for k, v in entity_scores.items():
    l.append({'start': k[0], 'end': k[1], 'label': k[2], 'prob' : v} )

for a in sorted(l, key= lambda x: x['start']):
    print(a)

### Output: ####

{'start': 0, 'end': 1, 'label': 'PERSON', 'prob': 0.4054479906820232}
{'start': 0, 'end': 1, 'label': 'ORG', 'prob': 0.01002015005487447}
{'start': 0, 'end': 1, 'label': 'PRODUCT', 'prob': 0.0008592912552754791}
{'start': 0, 'end': 1, 'label': 'WORK_OF_ART', 'prob': 0.0007666755792166002}
{'start': 0, 'end': 1, 'label': 'NORP', 'prob': 0.00034931990870877333}
{'start': 0, 'end': 1, 'label': 'TIME', 'prob': 0.0002786051849320804}
{'start': 3, 'end': 4, 'label': 'ORG', 'prob': 0.9990115861687987}
{'start': 3, 'end': 4, 'label': 'PRODUCT', 'prob': 0.0003378157477046507}
{'start': 3, 'end': 4, 'label': 'FAC', 'prob': 8.249734411749544e-05}

这篇关于如何从Spacy NER模型获得每个实体的预测概率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆