如何处理 NLP 中的两种实体提取方法 [英] How to handle two entity extraction methods in NLP

查看：32 发布时间：2022/1/2 18:02:17 nlp entity rasa-nlu

本文介绍了如何处理 NLP 中的两种实体提取方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用了两种不同的实体提取方法(https://rasa.com/docs/nlu/entities/) 同时在 RASA 框架中构建我的 NLP 模型以构建聊天机器人.机器人应该处理不同的问题，这些问题具有自定义实体以及一些一般问题，如位置或组织.所以我使用两个组件 ner_spacy 和 ner_crf 来创建模型.之后我在 python 中构建了一个小的帮助脚本来评估模型性能.在那里我注意到模型很难选择正确的实体.

I am using two different entity extraction methods (https://rasa.com/docs/nlu/entities/) while building my NLP model in the RASA framework to build a chatbot. The bot should handle different questions which have custom entities as well as some general ones like location or organisation. So I use both components ner_spacy and ner_crf to create the model. After that I build a small helper script in python to evaluate the model performance. There I noticed that the model struggles to choose the correct enity.

例如，对于单词X"，它选择了来自 SpaCy 的预定义实体ORG"，但它应该被识别为我在训练数据中定义的自定义实体.

For example for a word 'X' it choosed the pre-defined enity 'ORG' from SpaCy, but it should be recogniced as a custom enity which I defined in the training data.

如果我只使用 ner_crf 提取器，我在识别大写等位置实体时会面临巨大的问题.我最大的问题之一是单一答案实体.

If I just use the ner_crf extractor I face huge problems in identifing location enities like capitals. Also one of my biggest problems are single answer enities.

问:你最喜欢的动物是什么?"

Q : "What´s your favourite animal?"

A:狗

我的模型无法为这个单一的答案提取这个单一的实体动物".如果我用狗"这样的两个词来回答这个问题，模型就可以毫无问题地提取值为狗"的动物实体.

My model is not able to extract this single entity 'animal' for this single answer. If I answer this question with two words like 'The Dog', the model has no problems to extract the animal entity with the value 'Dog'.

所以我的问题是，使用两个不同的组件来提取实体是否聪明?一种用于自定义实体，另一种用于预定义实体.如果我使用两种方法，使用提取器的模型中的机制是什么?

So my question is, is it clever to use two different components to extract entities? One for custom enities and the other one for pre-defined enities. If I use two methods, what´s the mechanism in the model which extractor is used?

顺便说一下，目前我只是在测试，所以我的训练样本并没有应有的那么大(少于 100 个示例).如果我有更多的训练示例，问题是否能解决?

By the way, currently I´m just testing things out, so my training samples are not that huge it should be (less then 100 examples). Could the problem been solved if I have much more training examples?

如何处理 NLP 中的两种实体提取方法 [英] How to handle two entity extraction methods in NLP

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何处理 NLP 中的两种实体提取方法 [英] How to handle two entity extraction methods in NLP

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭