如何从Spacy NER模型获得每个实体的预测概率? [英] How to get probability of prediction per entity from Spacy NER model?
问题描述
我将此官方示例代码用于使用我自己的训练样本从头开始训练NER模型。
当我预测在新文本上使用此模型时,我想获得每个实体的预测概率。
< blockquote>
#测试保存的模型
print( Loading from,output_dir)
nlp2 = spacy.load(output_dir)
用于文本,TRAIN_DATA中的_:
doc = nlp2(text)
print( Entities,[(ent.text,ent.label_)用于doc.ents中的ent])
print(令牌,[(文档中t的((t.text,t.ent_type_,t.ent_iob)]])
我无法在Spacy中找到方法来预测每个实体的概率。
怎么办我从Spacy得到这种可能性了吗?我需要它对它应用限制。
从Spacy NER模型获得每个实体的预测概率并非易事。
这是从此处:
来自集合的导入空间
import defaultdict
texts = ['John在Microsoft工作。']
#要考虑的替代分析数。越多越慢,但不一定越好-您需要对问题进行试验。
beam_width = 16
#这会在每个步骤中剪辑解决方案。我们将排名最高的操作的得分乘以该值,并将结果用作阈值。这样可以防止解析器探索看起来不太可能的选项,从而节省了一些效率。由于我们对贪婪的目标进行了训练,因此准确性也可能会提高。
beam_density = 0.0001
nlp = spacy.load('en_core_web_md')
docs = list(nlp.pipe(texts,disable = ['ner']] ))
beams = nlp.entity.beam_parse(docs,beam_width = beam_width,beam_density = beam_density)
for doc,zip in zip(docs,beams):
entity_scores = defaultdict(float)
用于得分,nlp.entity.moves.get_beam_parses(beam)中的条目:
用于开始,结束,标签中的ents:
entity_scores [(开始,结束,标签) ] + =得分
l = []
for k,v在entity_scores.items()中:
l.append({'start':k [0],'end ':k [1],'label':k [2],'prob':v})
for a in sorted(l,key = lambda x:x ['start']) :
print(a)
###输出:####
{'开始':0,'结束':1,'标签' :'PERSON','prob':0.4054479906820232}
{'start':0,'end':1,'label':'ORG','prob':0.01002015005487447}
{'start' :0,'end':1,'label':'PRODUCT','prob':0.0008592912552754791}
{'start':0,'end':1,'label':'WORK_OF _ART','prob':0.0007666755792166002}
{'start':0,'end':1,'label':'NORP','prob':0.00034931990870877333}
{'start':0 ,'end':1,'label':'TIME','prob':0.0002786051849320804}
{'start':3,'end':4,'label':'ORG','prob': 0.9990115861687987}
{'开始':3,'结束':4,'标签':'PRODUCT','问题':0.0003378157477046507}
{'开始':3,'结束':4, 'label':'FAC','prob':8.249734411749544e-05}
I used this official example code to train a NER model from scratch using my own training samples.
When I predict using this model on new text, I want to get the probability of prediction of each entity.
# test the saved model print("Loading from", output_dir) nlp2 = spacy.load(output_dir) for text, _ in TRAIN_DATA: doc = nlp2(text) print("Entities", [(ent.text, ent.label_) for ent in doc.ents]) print("Tokens", [(t.text, t.ent_type_, t.ent_iob) for t in doc])
I am unable to find a method in Spacy to get the probability of prediction of each entity.
How do I get this probability from Spacy? I need it to apply a cutoff on it.
Getting the probabilities of prediction per entity from a Spacy NER model is not trivial. Here is the solution adapted from here :
import spacy
from collections import defaultdict
texts = ['John works at Microsoft.']
# Number of alternate analyses to consider. More is slower, and not necessarily better -- you need to experiment on your problem.
beam_width = 16
# This clips solutions at each step. We multiply the score of the top-ranked action by this value, and use the result as a threshold. This prevents the parser from exploring options that look very unlikely, saving a bit of efficiency. Accuracy may also improve, because we've trained on greedy objective.
beam_density = 0.0001
nlp = spacy.load('en_core_web_md')
docs = list(nlp.pipe(texts, disable=['ner']))
beams = nlp.entity.beam_parse(docs, beam_width=beam_width, beam_density=beam_density)
for doc, beam in zip(docs, beams):
entity_scores = defaultdict(float)
for score, ents in nlp.entity.moves.get_beam_parses(beam):
for start, end, label in ents:
entity_scores[(start, end, label)] += score
l= []
for k, v in entity_scores.items():
l.append({'start': k[0], 'end': k[1], 'label': k[2], 'prob' : v} )
for a in sorted(l, key= lambda x: x['start']):
print(a)
### Output: ####
{'start': 0, 'end': 1, 'label': 'PERSON', 'prob': 0.4054479906820232}
{'start': 0, 'end': 1, 'label': 'ORG', 'prob': 0.01002015005487447}
{'start': 0, 'end': 1, 'label': 'PRODUCT', 'prob': 0.0008592912552754791}
{'start': 0, 'end': 1, 'label': 'WORK_OF_ART', 'prob': 0.0007666755792166002}
{'start': 0, 'end': 1, 'label': 'NORP', 'prob': 0.00034931990870877333}
{'start': 0, 'end': 1, 'label': 'TIME', 'prob': 0.0002786051849320804}
{'start': 3, 'end': 4, 'label': 'ORG', 'prob': 0.9990115861687987}
{'start': 3, 'end': 4, 'label': 'PRODUCT', 'prob': 0.0003378157477046507}
{'start': 3, 'end': 4, 'label': 'FAC', 'prob': 8.249734411749544e-05}
这篇关于如何从Spacy NER模型获得每个实体的预测概率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!