如何从 Spacy NER 模型中获得每个实体的预测概率? [英] How to get probability of prediction per entity from Spacy NER model?

查看:27
本文介绍了如何从 Spacy NER 模型中获得每个实体的预测概率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用这个官方示例代码来训练使用我自己的训练样本从头开始创建一个 NER 模型.

当我在新文本上使用此模型进行预测时,我想获得每个实体的预测概率.

<块引用>

 # 测试保存的模型打印(加载自",输出目录)nlp2 = spacy.load(output_dir)对于文本,TRAIN_DATA 中的 _:doc = nlp2(文本)print("Entities", [(ent.text, ent.label_) for ent in doc.ents])print("Tokens", [(t.text, t.ent_type_, t.ent_iob) for t in doc])

我无法在 Spacy 中找到一种方法来获得每个实体的预测概率.

我如何从 Spacy 获得这个概率?我需要它来应用截止.

解决方案

从 Spacy NER 模型中获取每个实体的预测概率并非易事.这是改编自此处的解决方案:

<代码>进口空间从集合导入 defaultdicttexts = ['约翰在微软工作.']# 要考虑的替代分析的数量.更多更慢,但不一定更好——你需要对你的问题进行试验.波束宽度 = 16# 这在每一步剪辑解决方案.我们将排名靠前的动作的分数乘以该值,并将结果用作阈值.这可以防止解析器探索看起来不太可能的选项,从而节省一点效率.准确性也可能会提高,因为我们已经针对贪婪目标进行了训练.光束密度 = 0.0001nlp = spacy.load('en_core_web_md')docs = list(nlp.pipe(texts, disable=['ner']))梁 = nlp.entity.beam_parse(docs,beam_width=beam_width,beam_density=beam_density)对于文档,zip 中的梁(文档,梁):entity_scores = defaultdict(float)对于分数,nlp.entity.moves.get_beam_parses(beam) 中的 ents:对于 ents 中的开始、结束、标签:entity_scores[(start, end, label)] += score升= []对于 entity_scores.items() 中的 k, v:l.append({'start': k[0], 'end': k[1], 'label': k[2], 'prob': v} )对于 in sorted(l, key= lambda x: x['start']):打印(一)### 输出: ####{'开始':0,'结束':1,'标签':'人','概率':0.4054479906820232}{'开始':0,'结束':1,'标签':'ORG','概率':0.01002015005487447}{'开始':0,'结束':1,'标签':'产品','问题':0.0008592912552754791}{'开始':0,'结束':1,'标签':'WORK_OF_ART','问题':0.0007666755792166002}{'开始':0,'结束':1,'标签':'NORP','概率':0.00034931990870877333}{'开始':0,'结束':1,'标签':'时间','概率':0.0002786051849320804}{'开始':3,'结束':4,'标签':'ORG','概率':0.9990115861687987}{'开始':3,'结束':4,'标签':'产品','问题':0.0003378157477046507}{'开始':3,'结束':4,'标签':'FAC','概率':8.249734411749544e-05}

I used this official example code to train a NER model from scratch using my own training samples.

When I predict using this model on new text, I want to get the probability of prediction of each entity.

    # test the saved model
    print("Loading from", output_dir)
    nlp2 = spacy.load(output_dir)
    for text, _ in TRAIN_DATA:
        doc = nlp2(text)
        print("Entities", [(ent.text, ent.label_) for ent in doc.ents])
        print("Tokens", [(t.text, t.ent_type_, t.ent_iob) for t in doc])

I am unable to find a method in Spacy to get the probability of prediction of each entity.

How do I get this probability from Spacy? I need it to apply a cutoff on it.

解决方案

Getting the probabilities of prediction per entity from a Spacy NER model is not trivial. Here is the solution adapted from here :


import spacy
from collections import defaultdict

texts = ['John works at Microsoft.']

# Number of alternate analyses to consider. More is slower, and not necessarily better -- you need to experiment on your problem.
beam_width = 16
# This clips solutions at each step. We multiply the score of the top-ranked action by this value, and use the result as a threshold. This prevents the parser from exploring options that look very unlikely, saving a bit of efficiency. Accuracy may also improve, because we've trained on greedy objective.
beam_density = 0.0001 
nlp = spacy.load('en_core_web_md')


docs = list(nlp.pipe(texts, disable=['ner']))
beams = nlp.entity.beam_parse(docs, beam_width=beam_width, beam_density=beam_density)

for doc, beam in zip(docs, beams):
    entity_scores = defaultdict(float)
    for score, ents in nlp.entity.moves.get_beam_parses(beam):
        for start, end, label in ents:
            entity_scores[(start, end, label)] += score

l= []
for k, v in entity_scores.items():
    l.append({'start': k[0], 'end': k[1], 'label': k[2], 'prob' : v} )

for a in sorted(l, key= lambda x: x['start']):
    print(a)

### Output: ####

{'start': 0, 'end': 1, 'label': 'PERSON', 'prob': 0.4054479906820232}
{'start': 0, 'end': 1, 'label': 'ORG', 'prob': 0.01002015005487447}
{'start': 0, 'end': 1, 'label': 'PRODUCT', 'prob': 0.0008592912552754791}
{'start': 0, 'end': 1, 'label': 'WORK_OF_ART', 'prob': 0.0007666755792166002}
{'start': 0, 'end': 1, 'label': 'NORP', 'prob': 0.00034931990870877333}
{'start': 0, 'end': 1, 'label': 'TIME', 'prob': 0.0002786051849320804}
{'start': 3, 'end': 4, 'label': 'ORG', 'prob': 0.9990115861687987}
{'start': 3, 'end': 4, 'label': 'PRODUCT', 'prob': 0.0003378157477046507}
{'start': 3, 'end': 4, 'label': 'FAC', 'prob': 8.249734411749544e-05}

这篇关于如何从 Spacy NER 模型中获得每个实体的预测概率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆