通过我自己的培训示例来培训spaCy现有的POS标记器 [英] Train spaCy's existing POS tagger with my own training examples

查看：268 发布时间：2020/5/4 9:53:26 machine-learning nlp spacy pos-tagger

本文介绍了通过我自己的培训示例来培训spaCy现有的POS标记器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在我自己的词典上训练现有的POS标记器，而不是从头开始(我不想创建空模型"). 在spaCy的文档中，显示加载要统计的模型"，下一步是使用add_label方法将标签映射添加到标记器".但是，当我尝试加载英语小型模型并添加标签映射时，它会引发此错误:

I am trying to train the existing POS tagger on my own lexicon, not starting off from scratch (I do not want to create an "empty model"). In spaCy's documentation, it says "Load the model you want to stat with", and the next step is "Add the tag map to the tagger using add_label method". However, when I try to load the English small model, and add the tag map, it throws this error:

ValueError:[T003]当前不支持调整大小的预训练Tagger模型.

ValueError: [T003] Resizing pre-trained Tagger models is not currently supported.

我想知道如何修复它.

我还看到了实施在Spacy中针对现有英语模型:NLP-Python 定制了POS Tagger，但这表明我们创建了我不想要的空模型".

I have also seen Implementing custom POS Tagger in Spacy over existing english model : NLP - Python but it suggests that we create an "empty model" which is not what I want.

此外，即使我们的培训示例标签与通用依赖标签相同，在spaCy的文档中也不清楚我们是否需要映射字典(TAG_MAP).有什么想法吗?

Also, it is not very clear in spaCy's documentation if we need to have a mapping dictionary (TAG_MAP) even if our training examples tags are the same as the universal dependency tags. Any thoughts?

from __future__ import unicode_literals, print_function
import plac
import random
from pathlib import Path
import spacy
from spacy.util import minibatch, compounding

TAG_MAP = {"noun": {"pos": "NOUN"}, "verb": {"pos": "VERB"}, "adj": {"pos": "ADJ"}, "adv": {"pos": "ADV"}}

TRAIN_DATA = [
    ('Afrotropical', {'tags': ['adj']}), ('Afrocentricity', {'tags': ['noun']}),
    ('Afrocentric', {'tags': ['adj']}), ('Afrocentrism', {'tags': ['noun']}),
    ('Anglomania', {'tags': ['noun']}), ('Anglocentric', {'tags': ['adj']}),
    ('apraxic', {'tags': ['adj']}), ('aglycosuric', {'tags': ['adj']}),
    ('asecretory', {'tags': ['adj']}), ('aleukaemic', {'tags': ['adj']}),
    ('agrin', {'tags': ['adj']}), ('Eurotransplant', {'tags': ['noun']}),
    ('Euromarket', {'tags': ['noun']}), ('Eurocentrism', {'tags': ['noun']}),
    ('adendritic', {'tags': ['adj']}), ('asynaptic', {'tags': ['adj']}),
    ('Asynapsis', {'tags': ['noun']}), ('ametabolic', {'tags': ['adj']})
]
@plac.annotations(
    lang=("ISO Code of language to use", "option", "l", str),
    output_dir=("Optional output directory", "option", "o", Path),
    n_iter=("Number of training iterations", "option", "n", int),
)
def main(lang="en", output_dir=None, n_iter=25):
    nlp = spacy.load('en_core_web_sm', disable=['ner', 'parser'])
    tagger = nlp.get_pipe('tagger')
    for tag, values in TAG_MAP.items():
        tagger.add_label(tag, values)
    nlp.vocab.vectors.name = 'spacy_pretrained_vectors'
    optimizer = nlp.begin_training()
    for i in range(n_iter):
        random.shuffle(TRAIN_DATA)
        losses = {}
        # batch up the examples using spaCy's minibatch
        batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
        for batch in batches:
            texts, annotations = zip(*batch)
            nlp.update(texts, annotations, sgd=optimizer, losses=losses)
        print("Losses", losses)

    # test the trained model
    test_text = "I like Afrotropical apraxic blue eggs and Afrocentricity. A Eurotransplant is cool too. The agnathostomatous Euromarket and asypnapsis is even cooler. What about Eurocentrism?"
    doc = nlp(test_text)
    print("Tags", [(t.text, t.tag_, t.pos_) for t in doc])

    # save model to output directory
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the save model
        print("Loading from", output_dir)
        nlp2 = spacy.load(output_dir)
        doc = nlp2(test_text)
        print("Tags", [(t.text, t.tag_, t.pos_) for t in doc])


if __name__ == "__main__":
    plac.call(main)

通过我自己的培训示例来培训spaCy现有的POS标记器 [英] Train spaCy's existing POS tagger with my own training examples

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

通过我自己的培训示例来培训spaCy现有的POS标记器 [英] Train spaCy&#39;s existing POS tagger with my own training examples

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

通过我自己的培训示例来培训spaCy现有的POS标记器 [英] Train spaCy's existing POS tagger with my own training examples

登录关闭