用印度名字训练 Spacy NER [英] Train Spacy NER on Indian Names

查看:77
本文介绍了用印度名字训练 Spacy NER的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试自定义 Spacy 的 NER 以识别印度人的名字.遵循本指南 https://spacy.io/usage/training,这是我正在使用的数据集nofollow>https://github.com/nofollow9b93c7545c9dd93060bd/raw/b582593330765df3ccaae6f641f8cddc16f1e879/Indian-Female-Names.csv

I am trying to customize Spacy's NER to identify Indian names. Following this guide https://spacy.io/usage/training and this is the dataset I am using https://gist.githubusercontent.com/mbejda/9b93c7545c9dd93060bd/raw/b582593330765df3ccaae6f641f8cddc16f1e879/Indian-Female-Names.csv

根据代码,我应该提供以下格式的训练数据:

As per the code , I am supposed to provide training data in following format:

TRAIN_DATA = [
    ('Shivani', {
        'entities': [(0, 6, 'PERSON')]
    }),
    ('Isha ', {
        'entities': [(0,3 , 'PERSON')]
    })
]

我如何向 Spacy 提供大约 12000 个名称的训练数据,因为手动指定每个实体将是一件苦差事?是否有其他工具可用于标记所有名称?

How do I provide training data to Spacy for ~12000 names as manually specifying each entity will be a chore? Is there any other tool available to tag all the names ?

推荐答案

您忽略了为自定义名称训练 NLP 库的要点.训练数据必须是一个训练条目列表,每个条目都有一个句子文本,其中标识了名称的位置.请再次查看训练数据示例,以了解您需要如何提供完整的句子而不仅仅是名称.

You are missing the point of training a NLP library for custom names. The training data has to be a list of training entries that each have a sentence text with the location of the name(s) identified. Please review the training data example again to see how you need to supply a full sentence and not just a name.

Spacy 并不是一个公报匹配工具.您最好生成 100 个使用其中一些名称的句子,然后在这些带注释的句子上训练 Spacy.您可以根据需要添加更多完整的句子示例以提高准确性.Spacy 用于名称的原生 NER 非常强大,不需要 12000 个示例.

Spacy is not meant to be a gazette matching tool. You are likely better off generating 100 sentences that use some of these names and then training Spacy on those annotated sentences. You can add more full sentence examples as needed to increase accuracy. Spacy's native NER for names is robust and does not need 12000 examples.

@ak_35 下面的回答提供了如何提供带有标记名称位置的训练句子的示例.

@ak_35's answer below provides examples of how to provide training sentences with the location of names labeled.

这篇关于用印度名字训练 Spacy NER的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆