Keras中具有类权重的多标签分类 [英] Multi-label classification with class weights in Keras

查看:280
本文介绍了Keras中具有类权重的多标签分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在网络中有1000个班级,并且它们具有多标签输出.对于每个训练示例,积极输出的数量是相同的(即10),但是可以将其分配给1000个班级中的任何一个.因此10个类的输出为1,其余990个类的输出为0.

I have a 1000 classes in the network and they have multi-label outputs. For each training example, the number of positive output is same(i.e 10) but they can be assigned to any of the 1000 classes. So 10 classes have output 1 and rest 990 have output 0.

对于多标签分类,我将二进制交叉熵"用作成本函数,将"Sigmoid"用作激活函数.当我尝试使用0.5的规则作为1或0的截止点时.所有这些都为0.我知道这是一个类不平衡问题.通过此链接,我了解到,我可能不得不创建额外的输出标签.不幸的是,我无法弄清楚如何将其整合到keras中的简单神经网络中.

For the multi-label classification, I am using 'binary-cross entropy' as cost function and 'sigmoid' as the activation function. When I tried this rule of 0.5 as the cut-off for 1 or 0. All of them were 0. I understand this is a class imbalance problem. From this link, I understand that, I might have to create extra output labels.Unfortunately, I haven't been able to figure out how to incorporate that into a simple neural network in keras.

nclasses = 1000

# if we wanted to maximize an imbalance problem!
#class_weight = {k: len(Y_train)/(nclasses*(Y_train==k).sum()) for k in range(nclasses)}


inp = Input(shape=[X_train.shape[1]])
x = Dense(5000, activation='relu')(inp)

x = Dense(4000, activation='relu')(x)

x = Dense(3000, activation='relu')(x)
x = Dense(2000, activation='relu')(x)
x = Dense(nclasses, activation='sigmoid')(x)
model = Model(inputs=[inp], outputs=[x])

adam=keras.optimizers.adam(lr=0.00001)
model.compile('adam', 'binary_crossentropy')
history = model.fit(
    X_train, Y_train, batch_size=32, epochs=50,verbose=0,shuffle=False)

任何人都可以在这里提供代码帮助我,如果您能为这个问题提出一个好的准确性"指标,我也将不胜感激?

Could anyone help me with the code here and I would also highly appreciate if you could suggest a good 'accuracy' metric for this problem?

非常感谢:):)

推荐答案

我有一个类似的问题,不幸的是,大多数问题都没有答案.尤其是班级失衡问题.

I have a similar problem and unfortunately have no answer for most of the questions. Especially the class imbalance problem.

就度量而言,有几种可能性:就我而言,我使用最上面的1/2/3/4/5个结果,并检查其中之一是否正确.因为在您的情况下,您始终具有相同数量的标签= 1,因此您可以获取前10个结果,并查看其中有多少百分比是正确的,并将此结果平均化为批次大小.我没有发现将此算法作为keras度量标准的可能性.取而代之的是,我编写了一个回调,该回调计算了我的验证数据集在纪元末期的指标.

In terms of metric there are several possibilities: In my case I use the top 1/2/3/4/5 results and check if one of them is right. Because in your case you always have the same amount of labels=1 you could take your top 10 results and see how many percent of them are right and average this result over your batch size. I didn't find a possibility to include this algorithm as a keras metric. Instead, I wrote a callback, which calculates the metric on epoch end on my validation data set.

此外,如果您预测测试数据集中的前n个结果,请查看每个类的预测次数. 计数器类确实很方便.

Also, if you predict the top n results on a test dataset, see how many times each class is predicted. The Counter Class is really convenient for this purpose.

如果找到一种方法,该方法包括类权重而不拆分输出. 您需要一个包含形状的权重的numpy 2d数组[要预测的数字类为2(背景和信号)]. 可以使用以下函数来计算这样的数组:

If found a method to include class weights without splitting the output. You need a numpy 2d array containing weights with shape [number classes to predict, 2 (background and signal)]. Such an array could be calculated with this function:

def calculating_class_weights(y_true):
    from sklearn.utils.class_weight import compute_class_weight
    number_dim = np.shape(y_true)[1]
    weights = np.empty([number_dim, 2])
    for i in range(number_dim):
        weights[i] = compute_class_weight('balanced', [0.,1.], y_true[:, i])
    return weights

现在的解决方案是构建您自己的二进制交叉熵损失函数,您可以在其中自己乘以权重:

The solution is now to build your own binary crossentropy loss function in which you multiply your weights yourself:

def get_weighted_loss(weights):
    def weighted_loss(y_true, y_pred):
        return K.mean((weights[:,0]**(1-y_true))*(weights[:,1]**(y_true))*K.binary_crossentropy(y_true, y_pred), axis=-1)
    return weighted_loss

weights [:,0]是具有所有背景权重的数组,而weights [:,1]包含所有信号权重.

weights[:,0] is an array with all the background weights and weights[:,1] contains all the signal weights.

剩下的就是将这种损失包括在编译函数中:

All that is left is to include this loss into the compile function:

model.compile(optimizer=Adam(), loss=get_weighted_loss(class_weights))

这篇关于Keras中具有类权重的多标签分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆