Spacy.io维基百科实体链接器-结果NLP模型没有知识库实体 [英] Spacy.io Wikipedia Entity Linker - Results NLP Model Have no KB Entities

查看:18
本文介绍了Spacy.io维基百科实体链接器-结果NLP模型没有知识库实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在学习如何使用Wikipedia example here使用Sapcy.io实体链接器。

我从2000篇文章的小培训开始(它运行了20个小时),但结果模型无法识别或返回任何KB实体,即使是来自培训中使用的文本。

nlp_kb.from_disk("/path/to/nel-wikipedia/output_lt_kb80k_model_vsm/nlp") 

text = "Anarchism is a political philosophy and movement that rejects all involuntary, coercive forms of hierarchy. It calls for the abolition of the state which it holds to be undesirable, unnecessary and harmful. It is usually described alongside libertarian Marxism as the libertarian wing (libertarian socialism) of the socialist movement and as having a historical association with anti-capitalism and socialism. The history of anarchism goes back to prehistory, when some humans lived in anarchistic societies long before the establishment of formal states, realms or empires. With the rise of organised hierarchical bodies, skepticism toward authority also rose, but it was not until the 19th century that a self-conscious political movement emerged. During the latter half of the 19th and the first decades of the 20th century, the anarchist movement flourished in most parts of the world and had a significant role in workers' struggles for emancipation. Various anarchist schools of thought formed during this period. Anarchists have taken part in several revolutions, most notably in the Spanish Civil War, whose end marked the end of the classical era of anarchism. In the last decades of the 20th century and into the 21st century, the anarchist movement has been resurgent once more. Anarchism employs various tactics in order to meet its ideal ends; these can be broadly separated into revolutionary and evolutionary tactics."


doc = nlp_kb(text)
for ent in doc.ents:
    print(ent.text, ent.label_, ent.kb_id_)

结果

the 19th century DATE 
the latter half of the 19th and the first decades of the 20th century DATE 
Anarchists NORP 
the Spanish Civil War EVENT 
the last decades of the 20th century DATE 
the 21st century DATE

NLP模型没有实体链接器管道。

nlp_kb.meta["pipeline"]
['tagger', 'parser', 'ner']

但是meta.json有它。

{
  "lang":"en",
  "name":"core_web_lg",
  "license":"MIT",
  "author":"Explosion",
  "url":"https://explosion.ai",
  "email":"contact@explosion.ai",
  "description":"English multi-task CNN trained on OntoNotes, with GloVe vectors trained on Common Crawl. Assigns word vectors, POS tags, dependency parses and named entities.",
  "sources":[
    {
      "name":"OntoNotes 5",
      "url":"https://catalog.ldc.upenn.edu/LDC2013T19",
      "license":"commercial (licensed by Explosion)"
    },
    {
      "name":"GloVe Common Crawl",
      "author":"Jeffrey Pennington, Richard Socher, and Christopher D. Manning",
      "url":"https://nlp.stanford.edu/projects/glove/",
      "license":"Public Domain Dedication and License v1.0"
    }
  ],
  "pipeline":[
    "tagger",
    "parser",
    "ner",
    "entity_linker"
  ],

以下是NLP目录的常量

(spacy) ➜  nlp git:(master) ✗ ls
entity_linker meta.json     ner           parser        tagger        tokenizer     vocab

(spacy) ➜  nlp git:(master) ✗ ls -l entity_linker
total 55040
-rw-r--r--  1 staff       323 Sep  8 04:40 cfg
-rw-r--r--  1 staff  25294844 Sep  8 04:40 kb
-rw-r--r--  1 staff   2875799 Sep  8 04:40 model

我假设我加载的模型错误,但我不确定如何修复它。

推荐答案

您使用过此行:

nlp_kb.from_disk("/path/to/nel-wikipedia/output_lt_kb80k_model_vsm/nlp") 

,它基本上从磁盘加载现有nlp_kb的训练权重。但是,它实际上不会更改此nlp_kb对象的任何内部结构--它也不会自动添加新组件。

相反,您要做的是

nlp_el = spacy.load("/path/to/nel-wikipedia/output_lt_kb80k_model_vsm/nlp")

然后您应该有一个具有entity_linker组件的新NLP对象。

这篇关于Spacy.io维基百科实体链接器-结果NLP模型没有知识库实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆