如何处理NLP中的两种实体提取方法 [英] How to handle two entity extraction methods in NLP

查看:538
本文介绍了如何处理NLP中的两种实体提取方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用两种不同的实体提取方法( https://rasa.com/docs/nlu /entities/),同时在RASA框架中构建我的NLP模型以构建聊天机器人. 机器人应处理具有自定义实体以及位置或组织等一般性问题的不同问题. 因此,我同时使用了ner_spacy和ner_crf两个组件来创建模型.之后,我在python中构建了一个小的帮助程序脚本来评估模型性能.在那里,我注意到该模型难以选择正确的实体.

I am using two different entity extraction methods (https://rasa.com/docs/nlu/entities/) while building my NLP model in the RASA framework to build a chatbot. The bot should handle different questions which have custom entities as well as some general ones like location or organisation. So I use both components ner_spacy and ner_crf to create the model. After that I build a small helper script in python to evaluate the model performance. There I noticed that the model struggles to choose the correct enity.

例如,对于单词"X",它从SpaCy中选择了预定义的实体"ORG",但应将其识别为我在训练数据中定义的自定义实体.

For example for a word 'X' it choosed the pre-defined enity 'ORG' from SpaCy, but it should be recogniced as a custom enity which I defined in the training data.

如果仅使用ner_crf提取程序,则在识别位置实体(例如首都)时会遇到巨大的问题.我最大的问题之一还是单答案实体.

If I just use the ner_crf extractor I face huge problems in identifing location enities like capitals. Also one of my biggest problems are single answer enities.

问:您最喜欢的动物是什么?"

Q : "What´s your favourite animal?"

A:狗

我的模型无法为此单一答案提取该单一实体动物".如果我用狗"之类的两个词回答该问题,则该模型将不会提取具有狗"值的动物实体.

My model is not able to extract this single entity 'animal' for this single answer. If I answer this question with two words like 'The Dog', the model has no problems to extract the animal entity with the value 'Dog'.

所以我的问题是,使用两个不同的组件提取实体是否明智?一个用于自定义实体,另一个用于预定义实体. 如果我使用两种方法,那么在模型中使用哪种提取器的机制是什么?

So my question is, is it clever to use two different components to extract entities? One for custom enities and the other one for pre-defined enities. If I use two methods, what´s the mechanism in the model which extractor is used?

顺便说一句,目前我只是在测试东西,所以我的训练样本并没有那么大(少于100个例子).如果我有更多的培训示例,是否可以解决问题?

By the way, currently I´m just testing things out, so my training samples are not that huge it should be (less then 100 examples). Could the problem been solved if I have much more training examples?

推荐答案

您在这里面临2个问题.我建议我发现有帮助的几种方法.

You are facing 2 problems here. I am suggesting few ways that i found helpful.

1.自定义实体识别: 为了解决这个问题,您需要添加更多具有所有可能长度的实体的训练句子.当实体(例如介词)周围有可识别的标记时,ner_crf会更好地预测

1. Custom entity recognition: To solve this you need to add more training sentences with all possible lengths of entities. ner_crf is going to predict better when there are identifiable markers around entities (e.g. prepositions)

2.从单个单词答案中提取实体: 作为一种解决方法,我建议您在客户端上进行以下操作.

2. Extracting entities from single word answer : As a workaround, i suggest you to do below manipulations on client end.

在发送诸如What´s your favorite animal?之类的问题时,请在问题上附加一个标记,以指示客户期望得到一个答案.例如 您可以将##SINGLE## What´s your favorite animal?发送给客户端.

When you are sending question like What´s your favorite animal?, append a marker to question to indicate to client that a single answer is expected. e.g. You can send ##SINGLE## What´s your favorite animal? to client.

客户可以从问题中删除##SINGLE##并将其显示给用户.但是,当客户端将用户的响应发送到服务器时,它不发送Dog,而是发送类似User responded with single answer as Dog

Client can remove the ##SINGLE## from question and show it to user. But when client sends user's response to server, it doesn't send Dog, it send something like User responded with single answer as Dog

您可以训练模型以从此类答案中提取实体.

You can train your model to extract entities from such an answer.

这篇关于如何处理NLP中的两种实体提取方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆