Keras中WARP丢失的暗示 [英] Implimentation of WARP loss in Keras

查看:108
本文介绍了Keras中WARP丢失的暗示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Keras API实现翘曲损失(成对排名函数的类型).我有点卡住了如何成功.

I am trying to implement warp loss (type of pairwise ranking function) with Keras API. I am kinda stuck how this can be succeeded.

翘曲损耗的定义来自 lightFM文档.:

The definition of warp loss is taken from lightFM doc.:

对于给定的(用户,正项对),从所有剩余项中随机抽取一个负项.计算两个项目的预测;如果负面项目的预测超出正面项目的预测加上边距,请执行梯度更新以将正面项目的排名提高到负面项目的排名降低.如果没有排名违规,请继续对否定项进行抽样,直到发现违规为止.

For a given (user, positive item pair), sample a negative item at random from all the remaining items. Compute predictions for both items; if the negative item’s prediction exceeds that of the positive item plus a margin, perform a gradient update to rank the positive item higher and the negative item lower. If there is no rank violation, continue sampling negative items until a violation is found.

Warp函数用于例如 #hashtags的语义嵌入,该论文已发表来自facebook AI研究.在本文中,他们试图为短文本预测最具代表性的主题标签.其中'user'被认为是短文本,'positive item'是短文本的主题标签,而negative items是从主题标签查找"中统一采样的一些随机主题标签.

Warp function is used for example in semantic embeddings of #hashtags, a paper published from facebook AI research. In this paper they try to predict the most representable hashtags for short texts. Where 'user' is considered the short text, 'positive item' is the hashtag of the short text, and negative items are some random hashtags uniformly sampled from the 'hashtag lookup'.

我正在遵循另一种三重态损失的含义来创建经线: github

I am following the implimentation of another triplet loss to create the warp one: github

我的理解是,对于每个数据点,我将有3个输入.嵌入示例("semi"伪代码):

My understanding is that for each data point I will have 3 inputs. Example with embeddings('semi' pseudocode):

sequence_input = Input(shape=(100, ), dtype='int32') # 100 features per data point
positive_example = Input(shape=(1, ), dtype='int32', name="positive") # the one positive example
negative_examples = Input(shape=(1000,), dtype='int32', name="random_negative_examples") # 1000 random negative examples.

#map data points to already created embeddings
embedded_seq_input = embedded_layer(sequence_input)
embedded_positive = embedded_layer(positive_example)
embedded_negatives = embedded_layer(negative_examples)

conv1 = Convolution1D(...)(embeddded_seq_input)
               .
               .
               .
z = Dense(vector_size_of_embedding,activation="linear")(convN)

loss = merge([z, embedded_positive, embedded_negatives],mode=warp_loss)
                         .
                         .
                         .

其中warp_loss是(我假设得到1000个随机负数,而不是全部取而代之,其分数来自余弦模拟):

where warp_loss is(where I am assuming of getting 1000 random negative instead of taking all of them and the scores comes of the cosinus similatiry):

def warp_loss(X):
    # pseudocode
    z, positive, negatives = X
    positive_score = cosinus_similatiry(z, positive)
    counts = 1
    loss = 0
    for negative in negatives:
        score = cosinus_similatiry(z, negative)
        if score > positive_score:
           loss = ((number_of_labels - 1) / counts) * (score + 1 - positive_score
        else:
           counts += 1
    return loss

很好地描述了如何计算翘曲:

How to compute the warp is described nicely: post

我不确定这是否是正确的方法,但是我找不到实现warp_loss伪函数的方法.我可以使用merge([x,u],mode='cos')计算余弦值,但这假定尺寸相同.因此,我不确定如何对多个否定示例使用merge模式cos,因此我尝试创建自己的warp_loss.

I am not sure if it is the correct way of doing it but i couldn't find a way to implement the warp_loss pseudo function. I can compute cosinus using merge([x,u],mode='cos') but this assumes same dimensions. So I am not sure how to use merge mode cos for the multiple negative examples so I am trying to create my own warp_loss.

任何见解,实施类似的例子,评论都是有用的.

Any insights, implemented similar examples, comments are useful.

推荐答案

首先,我认为无法在批处理训练范式中实现WARP.因此,您无法在Keras中实现WARP.这是因为WARP本质上是顺序的,因此无法处理分解成批的数据,la Keras.我想如果您进行完全随机的批处理,则可以将其完成.

First of all, I would argue that it is not possible to implement WARP in the batch training paradigm. Therefore you can't implement WARP in Keras. This is because WARP is intrinsically sequential, so it can't handle data broken into batches, a la Keras. I suppose if you did fully stochastic batches, you could pull it off.

通常,对于WARP,您要包含1的余量,但是正如在本文中一样,您可以将其视为超参数:

Typically for WARP you include a margin of 1, but as in the paper you can consider it a hyperparam:

if neg_score > pos_score-1: #margin of 1
  loss = log(num_items / counts) #loss weighted by sample count
  loss = max(1, loss) #this looks like same thing you were doing in diff way

这优于其先前的BPR,因为它针对top k精度(而不是平均精度)进行了优化.

This is superior to it's predecessor BPR, in that optimizes for top k precision instead of average precision.

这篇关于Keras中WARP丢失的暗示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆