如何从Spacy NER模型获得每个实体的预测概率？ [英] How to get probability of prediction per entity from Spacy NER model?

查看：127 发布时间：2020/10/19 21:50:26 python deep-learning nlp spacy ner

本文介绍了如何从Spacy NER模型获得每个实体的预测概率？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我将此官方示例代码用于使用我自己的训练样本从头开始训练NER模型。

当我预测在新文本上使用此模型时，我想获得每个实体的预测概率。

< blockquote>

 ＃测试保存的模型
 print（ Loading from，output_dir）
 nlp2 = spacy.load（output_dir）
用于文本，TRAIN_DATA中的_：
 doc = nlp2（text）
 print（ Entities，[（ent.text，ent.label_）用于doc.ents中的ent]）
 print（令牌，[（文档中t的（（t.text，t.ent_type_，t.ent_iob）]]）

我无法在Spacy中找到方法来预测每个实体的概率。

怎么办我从Spacy得到这种可能性了吗？我需要它对它应用限制。

解决方案

从Spacy NER模型获得每个实体的预测概率并非易事。
这是从此处：

  
来自集合的导入空间
 import defaultdict 
 
 texts = ['John在Microsoft工作。'] 
 
＃要考虑的替代分析数。越多越慢，但不一定越好-您需要对问题进行试验。 
 beam_width = 16 
＃这会在每个步骤中剪辑解决方案。我们将排名最高的操作的得分乘以该值，并将结果用作阈值。这样可以防止解析器探索看起来不太可能的选项，从而节省了一些效率。由于我们对贪婪的目标进行了训练，因此准确性也可能会提高。 
 beam_density = 0.0001 
 nlp = spacy.load（'en_core_web_md'）
 
 
 docs = list（nlp.pipe（texts，disable = ['ner']] ））
 beams = nlp.entity.beam_parse（docs，beam_width = beam_width，beam_density = beam_density）
 
 for doc，zip in zip（docs，beams）：
entity_scores = defaultdict（float）
用于得分，nlp.entity.moves.get_beam_parses（beam）中的条目：
用于开始，结束，标签中的ents：
entity_scores [（开始，结束，标签） ] + =得分
 
l = [] 
 for k，v在entity_scores.items（）中：
 l.append（{'start'：k [0]，'end '：k [1]，'label'：k [2]，'prob'：v}）
 
 for a in sorted（l，key = lambda x：x ['start']） ：
 print（a）
 
 ###输出：#### 
 
 {'开始'：0，'结束'：1，'标签' ：'PERSON'，'prob'：0.4054479906820232} 
 {'start'：0，'end'：1，'label'：'ORG'，'prob'：0.01002015005487447} 
 {'start' ：0，'end'：1，'label'：'PRODUCT'，'prob'：0.0008592912552754791} 
 {'start'：0，'end'：1，'label'：'WORK_OF _ART'，'prob'：0.0007666755792166002} 
 {'start'：0，'end'：1，'label'：'NORP'，'prob'：0.00034931990870877333} 
 {'start'：0 ，'end'：1，'label'：'TIME'，'prob'：0.0002786051849320804} 
 {'start'：3，'end'：4，'label'：'ORG'，'prob'： 0.9990115861687987} 
 {'开始'：3，'结束'：4，'标签'：'PRODUCT'，'问题'：0.0003378157477046507} 
 {'开始'：3，'结束'：4， 'label'：'FAC'，'prob'：8.249734411749544e-05}

I used this official example code to train a NER model from scratch using my own training samples.

When I predict using this model on new text, I want to get the probability of prediction of each entity.

    # test the saved model
    print("Loading from", output_dir)
    nlp2 = spacy.load(output_dir)
    for text, _ in TRAIN_DATA:
        doc = nlp2(text)
        print("Entities", [(ent.text, ent.label_) for ent in doc.ents])
        print("Tokens", [(t.text, t.ent_type_, t.ent_iob) for t in doc])

I am unable to find a method in Spacy to get the probability of prediction of each entity.

How do I get this probability from Spacy? I need it to apply a cutoff on it.

解决方案

Getting the probabilities of prediction per entity from a Spacy NER model is not trivial. Here is the solution adapted from here :


import spacy
from collections import defaultdict

texts = ['John works at Microsoft.']

# Number of alternate analyses to consider. More is slower, and not necessarily better -- you need to experiment on your problem.
beam_width = 16
# This clips solutions at each step. We multiply the score of the top-ranked action by this value, and use the result as a threshold. This prevents the parser from exploring options that look very unlikely, saving a bit of efficiency. Accuracy may also improve, because we've trained on greedy objective.
beam_density = 0.0001 
nlp = spacy.load('en_core_web_md')


docs = list(nlp.pipe(texts, disable=['ner']))
beams = nlp.entity.beam_parse(docs, beam_width=beam_width, beam_density=beam_density)

for doc, beam in zip(docs, beams):
    entity_scores = defaultdict(float)
    for score, ents in nlp.entity.moves.get_beam_parses(beam):
        for start, end, label in ents:
            entity_scores[(start, end, label)] += score

l= []
for k, v in entity_scores.items():
    l.append({'start': k[0], 'end': k[1], 'label': k[2], 'prob' : v} )

for a in sorted(l, key= lambda x: x['start']):
    print(a)

### Output: ####

{'start': 0, 'end': 1, 'label': 'PERSON', 'prob': 0.4054479906820232}
{'start': 0, 'end': 1, 'label': 'ORG', 'prob': 0.01002015005487447}
{'start': 0, 'end': 1, 'label': 'PRODUCT', 'prob': 0.0008592912552754791}
{'start': 0, 'end': 1, 'label': 'WORK_OF_ART', 'prob': 0.0007666755792166002}
{'start': 0, 'end': 1, 'label': 'NORP', 'prob': 0.00034931990870877333}
{'start': 0, 'end': 1, 'label': 'TIME', 'prob': 0.0002786051849320804}
{'start': 3, 'end': 4, 'label': 'ORG', 'prob': 0.9990115861687987}
{'start': 3, 'end': 4, 'label': 'PRODUCT', 'prob': 0.0003378157477046507}
{'start': 3, 'end': 4, 'label': 'FAC', 'prob': 8.249734411749544e-05}

这篇关于如何从Spacy NER模型获得每个实体的预测概率？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从Spacy NER模型获得每个实体的预测概率？ [英] How to get probability of prediction per entity from Spacy NER model?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从Spacy NER模型获得每个实体的预测概率？ [英] How to get probability of prediction per entity from Spacy NER model?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭