在Spacy上基于现有英语模型实现自定义POS Tagger:NLP-Python [英] Implementing custom POS Tagger in Spacy over existing english model : NLP - Python
问题描述
我正在尝试重新训练现有的POS Tagger,以便使用下面的代码显示某些误分类单词的正确标签.但这给了我这个错误:
I am trying to retrain the existing POS Tagger in spacy to display the proper tags for certain misclassified words using the code below. But it gives me this error :
警告:未命名向量-这将不允许多个向量模型 被加载. (形状:(0,0))
Warning: Unnamed vectors -- this won't allow multiple vectors models to be loaded. (Shape: (0, 0))
from spacy.vocab import Vocab
from spacy.tokens import Doc
from spacy.gold import GoldParse
nlp = spacy.load('en_core_web_sm')
optimizer = nlp.begin_training()
vocab = Vocab(tag_map={})
doc = Doc(vocab, words=[word for word in ['ThermostatFailedOpen','ThermostatFailedClose','BlahDeBlah']])
gold = GoldParse(doc, tags=['NNP']*3)
nlp.update([doc], [gold], drop=0, sgd=optimizer)
此外,当我再次尝试检查代码是否已使用下面的代码正确分类
Also, when i try to check again to see if the tags have been classified correctly using the code below
doc = nlp('If ThermostatFailedOpen moves from false to true, we are going to party')
for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,
token.shape_, token.is_alpha, token.is_stop)
ThermostatFailedOpen ThermostatFailedopen VERB VB nsubj XxxxxXxxxxXxxx 真假
ThermostatFailedOpen thermostatfailedopen VERB VB nsubj XxxxxXxxxxXxxx True False
这些单词没有正确分类(我猜是预期的)!有关如何解决此问题的任何见解?
The words are not classified correctly (as expected I guess)! Any insights on how to fix this?
推荐答案
如果您使用相同的标签,并且只需要对其进行更好的培训,则无需添加新标签.但是,如果您使用不同的标签集,则需要训练新模型.
If you are using the same labels, and just need to train it better, there is no need to add new labels. However, if you are using a different set of labels, you need to train a new model.
对于第一种情况,您执行get_pipe('tagger')
,跳过add_label
循环并继续进行.
For the first case, you do get_pipe('tagger')
, skip the add_label
loop and keep going.
对于第二种情况,您需要创建一个新的标记器,对其进行训练,然后将其添加到管道中.为此,在加载模型时,您还需要禁用标记器(因为您将训练新的标记器).我也在此处
For the second case, you need to create a new tagger, train it, then add it to the pipeline. For this, you will need to also disable the tagger when loading the model (since you will be training a new one). I've also answered this here
这篇关于在Spacy上基于现有英语模型实现自定义POS Tagger:NLP-Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!