Keras合并带有标签的类别预测 [英] Keras merging class prediction with labels

查看:155
本文介绍了Keras合并带有标签的类别预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在训练我的网络时,我遇到了一个多标签分类问题,其中我将类标签转换为一种热编码.

When training my network, I have a multi label classification problem in which I convert the class labels into one hot encoding.

训练模型并生成预测后-keras只需输出一个值数组,而无需指定类标签.

After training the model, and generating predictions - keras simply outputs an array of values without specifying the class label.

合并这些内容的最佳实践是什么,以便我的API可以将有意义的结果返回给使用者?

What is best practice to merge these, so my API can return meaningful results to the consumer?

示例

y = pd.get_dummies(df_merged.eventId)
y

2CBC9h3uple1SXxEVy8W    GiiFxmfrUwBNMGgFuoHo    e06onPbpyCucAGXw01mM
12  1                   0                       0
13  1                   0                       0
14  1                   0                       0

prediction = model.predict(pred_test_input)
prediction
array([[0.5002058 , 0.49697363, 0.50251794]], dtype=float32)

所需结果: {results: { 2CBC9h3uple1SXxEVy8W: 0.5002058, ...}

根据评论添加模型-但这只是一个玩具模型.

Adding model as per comment - but this is just a toy model.

model = Sequential()
model.add(
  Embedding(
    input_dim=embeddings_index.shape[0],
    output_dim=embeddings_index.shape[1],
    weights=[embeddings_index],
    input_length=MAX_SEQ_LENGTH,
    trainable=False,
  )
)
model.add(LSTM(300))
model.add(Dense(units=len(y.columns), activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

编辑2-加y.

所以我的y采用以下格式:

So my y is in the following format:

eventId
123
123
234
...

然后我使用y = pd.get_dummies(df_merged.eventId)将其转换为模型可以使用的东西,并希望将eventIds返回到预测中.

I then use y = pd.get_dummies(df_merged.eventId) to convert this into something the model can consume and would like to append the eventIds back to the predictions.

推荐答案

首先,如果要进行多标签分类,则应使用binary_crossentropy损失:

First of all, if you are doing multi-label classification, then you should use the binary_crossentropy loss:

model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])

那么重要的一点是,keras的准确性没有考虑多标签分类,因此这将是一个误导性指标.每个类的精度/召回率都是更合适的指标.

Then it is important to say that keras' accuracy does not consider multi-label classification, so it will be a misleading metric. More appropriate metrics are precision/recall for each class.

要获得班级预测,您必须对每个班级的预测进行阈值设置,并且必须调整该阈值(每个班级不必相同),例如:

To get class predictions, you have to threshold each class' predictions, and it is a threshold that you have to tune (it does not have to be the same for each class), so for example:

class_names = y.columns.tolist()
pred_classes = {}
preds = model.predict(pred_test_input)

thresh = 0.5
for i in range(num_classes):
    if preds[i] > thresh:
        pred_classes[class_name[i]] = preds[i]

这将输出pred_classes词典,其中包含超过阈值的类,并包括一个置信度得分.

This will output the pred_classes dictionary with the classes over the threshold, and include a confidence score.

这篇关于Keras合并带有标签的类别预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆