最长匹配仅与 Spacy Phrasematcher [英] Longest match only with Spacy Phrasematcher

查看：69 发布时间：2021/6/7 20:39:18 python nlp spacy named-entity-recognition ner

本文介绍了最长匹配仅与 Spacy Phrasematcher的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我创建了一个 Spacy Phrasematcher 来匹配文档中的名称，遵循教程.我想使用结果匹配作为额外的训练数据来训练一个 Spacy NER 模型.但是，我的模式分别包含全名(例如Barack Obama")和姓氏(Obama").

I have created a Spacy Phrasematcher to match names in a document, following the tutorial. I want to use the resulting matches as additional training data in order to train a Spacy NER model. My patterns, however, contain both full names (e.g. 'Barack Obama') and last names ('Obama') separately.

因此，在包含Barack Obama"的句子中，两种模式都匹配，导致匹配重叠.但是，当我尝试使用数据进行训练时，这种重叠会触发异常，例如:

Hence, in a sentence that contains 'Barack Obama', both patterns match, resulting in overlapping matches. This overlap, however, triggers an exception when I try to use the data for training, e.g.:

ValueError: [E103] Trying to set conflicting doc.ents: '(19, 33, 'PERSON')' and '(29, 33, 'PERSON')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.

我一直在考虑在使用数据进行训练之前过滤掉重叠的匹配，但这似乎是一种非常低效的方法，导致处理大数据的时间显着增加.

I've been considering to filter out overlapping matches before using the data for training, but this seems like a very inefficient approach, resulting in a significant increase in processing time for large data.

有没有办法设置 PhraseMatcher 以便它只匹配最长匹配的重叠匹配?

Is there a way to set up a PhraseMatcher so that it only matches the longest match for overlapping matches?

最长匹配仅与 Spacy Phrasematcher [英] Longest match only with Spacy Phrasematcher

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

最长匹配仅与 Spacy Phrasematcher [英] Longest match only with Spacy Phrasematcher

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭