spacy如何将单词嵌入用于命名实体识别(NER)? [英] How does spacy use word embeddings for Named Entity Recognition (NER)?

查看:639
本文介绍了spacy如何将单词嵌入用于命名实体识别(NER)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用spaCy训练NER模型,以识别位置,(人员)姓名和组织.我正在尝试了解spaCy如何识别文本中的实体,但无法找到答案.来自Github上的此问题

I'm trying to train an NER model using spaCy to identify locations, (person) names, and organisations. I'm trying to understand how spaCy recognises entities in text and I've not been able to find an answer. From this issue on Github and this example, it appears that spaCy uses a number of features present in the text such as POS tags, prefixes, suffixes, and other character and word-based features in the text to train an Averaged Perceptron.

但是,在代码中没有任何地方似乎显示spaCy使用了GLoVe嵌入(尽管如果存在于GLoVe语料库中,则句子/文档中的每个单词似乎都有它们).

However, nowhere in the code does it appear that spaCy uses the GLoVe embeddings (although each word in the sentence/document appears to have them, if present in the GLoVe corpus).

我的问题是-

  1. 现在这些在NER系统中使用了吗?
  2. 如果我将单词向量切换到另一个集合,我是否应该期望性能会以有意义的方式改变?
  3. 我在代码中的哪里可以找到spaCy如何使用词向量?(如果全部)?
  1. Are these used in the NER system now?
  2. If I were to switch out the word vectors to a different set, should I expect performance to change in a meaningful way?
  3. Where in the code can I find out how (if it all) spaCy is using the word vectors?

我尝试浏览Cython代码,但无法理解标签系统是否使用单词嵌入.

I've tried looking through the Cython code, but I'm not able to understand whether the labelling system uses word embeddings.

推荐答案

spaCy确实将单词嵌入用于其NER模型(即多层CNN). spaCy的创建者Matthew Honnibal制作了一段非常不错的视频,介绍了其NER的工作方式此处.所有这三个英语模型都使用在Common Crawl上训练的GloVe向量,但是较小的模型通过将相似的单词映射到相同的向量来修剪"向量的数量

spaCy does use word embeddings for its NER model, which is a multilayer CNN. There's a quite a nice video that Matthew Honnibal, the creator of spaCy made, about how its NER works here. All three English models use GloVe vectors trained on Common Crawl, but the smaller models "prune" the number of vectors by having similar words mapped to the same vector link.

添加自定义向量是完全可行的. spaCy 文档中对此过程进行了概述,并在 Github .

It's quite doable to add custom vectors. There's an overview of the process in the spaCy docs, plus some example code on Github.

这篇关于spacy如何将单词嵌入用于命名实体识别(NER)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆