处理文本数据进行分类 [英] Dealing with textual data for classification

查看:89
本文介绍了处理文本数据进行分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们的输入数据由离散值和一串文本组成,并且输出应该是一组标记.

Assuming we have input data that consists of discrete values as well as a string of text, and the output should be a set of tags.

要将其转换为可以输入到神经网络的数据,我在弄清楚如何处理文本输入时遇到了麻烦.

To turn this into data that can be fed into a neural net, I'm having trouble figuring out how to deal with the textual input.

我假设仅使用文本输入,就可以产生思想向量的RNN可以工作,但是我不确定如何将其余的输入数据一起提供.

Using only the textual input, I assume a RNN producing a thought vector, could work, I am however a bit uncertain how to feed the rest of the input data along.

推荐答案

如果使用RNN来处理文本输入,则可以将RNN的输出与离散特征的一键编码连接在一起.然后可以将级联的向量输入到输出层中(例如,逻辑运算以计算跨多个标签的交叉熵损失).

If you are using RNN to handle the textual input, then the output of RNN can be concatenated with a one-hot-encoding of your discrete features. The concatenated vector can then be fed into an output layer (for example, logistic to calculate cross-entropy loss across multi-labels).

类似地,如果您使用嵌入层来映射输入文本,则还可以为您的离散特征学习另一种嵌入.然后可以将这两个嵌入式要素族串联起来,以馈入输出层.

Similarly, if you are using an embedding layer to map the input texts, you can learn another embedding for your discrete features as well. The two embedded feature families can then be concatenated to fed into output layers.

这篇关于处理文本数据进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆