如何处理 NLP 中的两种实体提取方法 [英] How to handle two entity extraction methods in NLP

查看:32
本文介绍了如何处理 NLP 中的两种实体提取方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用了两种不同的实体提取方法(https://rasa.com/docs/nlu/entities/) 同时在 RASA 框架中构建我的 NLP 模型以构建聊天机器人.机器人应该处理不同的问题,这些问题具有自定义实体以及一些一般问题,如位置或组织.所以我使用两个组件 ner_spacy 和 ner_crf 来创建模型.之后我在 python 中构建了一个小的帮助脚本来评估模型性能.在那里我注意到模型很难选择正确的实体.

I am using two different entity extraction methods (https://rasa.com/docs/nlu/entities/) while building my NLP model in the RASA framework to build a chatbot. The bot should handle different questions which have custom entities as well as some general ones like location or organisation. So I use both components ner_spacy and ner_crf to create the model. After that I build a small helper script in python to evaluate the model performance. There I noticed that the model struggles to choose the correct enity.

例如,对于单词X",它选择了来自 SpaCy 的预定义实体ORG",但它应该被识别为我在训练数据中定义的自定义实体.

For example for a word 'X' it choosed the pre-defined enity 'ORG' from SpaCy, but it should be recogniced as a custom enity which I defined in the training data.

如果我只使用 ner_crf 提取器,我在识别大写等位置实体时会面临巨大的问题.我最大的问题之一是单一答案实体.

If I just use the ner_crf extractor I face huge problems in identifing location enities like capitals. Also one of my biggest problems are single answer enities.

问:你最喜欢的动物是什么?"

Q : "What´s your favourite animal?"

A:狗

我的模型无法为这个单一的答案提取这个单一的实体动物".如果我用狗"这样的两个词来回答这个问题,模型就可以毫无问题地提取值为狗"的动物实体.

My model is not able to extract this single entity 'animal' for this single answer. If I answer this question with two words like 'The Dog', the model has no problems to extract the animal entity with the value 'Dog'.

所以我的问题是,使用两个不同的组件来提取实体是否聪明?一种用于自定义实体,另一种用于预定义实体.如果我使用两种方法,使用提取器的模型中的机制是什么?

So my question is, is it clever to use two different components to extract entities? One for custom enities and the other one for pre-defined enities. If I use two methods, what´s the mechanism in the model which extractor is used?

顺便说一下,目前我只是在测试,所以我的训练样本并没有应有的那么大(少于 100 个示例).如果我有更多的训练示例,问题是否能解决?

By the way, currently I´m just testing things out, so my training samples are not that huge it should be (less then 100 examples). Could the problem been solved if I have much more training examples?

推荐答案

您在这里面临 2 个问题.我提出了一些我认为有用的方法.

You are facing 2 problems here. I am suggesting few ways that i found helpful.

1.自定义实体识别:为了解决这个问题,您需要添加更多具有所有可能长度的实体的训练句子.当实体周围有可识别的标记(例如介词)时,ner_crf 会更好地预测

1. Custom entity recognition: To solve this you need to add more training sentences with all possible lengths of entities. ner_crf is going to predict better when there are identifiable markers around entities (e.g. prepositions)

2.从单个单词答案中提取实体:作为一种解决方法,我建议您在客户端进行以下操作.

2. Extracting entities from single word answer : As a workaround, i suggest you to do below manipulations on client end.

当您发送诸如 您最喜欢的动物是什么? 之类的问题时,请在问题后附加一个标记,以向客户表明需要一个单一的答案.例如您可以将##SINGLE##您最喜欢的动物是什么?发送给客户.

When you are sending question like What´s your favorite animal?, append a marker to question to indicate to client that a single answer is expected. e.g. You can send ##SINGLE## What´s your favorite animal? to client.

客户可以从问题中删除 ##SINGLE## 并将其显示给用户.但是当客户端将用户的响应发送到服务器时,它不会发送 Dog,而是发送类似 User Responding with single answer as Dog

Client can remove the ##SINGLE## from question and show it to user. But when client sends user's response to server, it doesn't send Dog, it send something like User responded with single answer as Dog

您可以训练您的模型从这样的答案中提取实体.

You can train your model to extract entities from such an answer.

这篇关于如何处理 NLP 中的两种实体提取方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆