如何使用 LSTM 自动编码器在多特征序列中在解码时正确忽略填充或丢失的时间步长 [英] How to correctly ignore padded or missing timesteps at decoding time in multi-feature sequences with LSTM autonecoder

查看:24
本文介绍了如何使用 LSTM 自动编码器在多特征序列中在解码时正确忽略填充或丢失的时间步长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过使用自动编码器进行重建来学习文本序列的潜在表示(多个特征 (3)).由于某些序列比我正在考虑的最大 pad 长度或多个时间步长 (seq_length=15) 短,我不确定重建是否会学会忽略时间步长以计算损失或准确性.

I am trying to learn a latent representation for text sequence (multiple features (3)) by doing reconstruction USING AUTOENCODER. As some of the sequences are shorter than the maximum pad length or a number of time steps I am considering (seq_length=15), I am not sure if reconstruction will learn to ignore the timesteps or not for calculating loss or accuracies.

我遵循了this的建议回答来裁剪输出,但我的损失是nan和几个精度.

I followed suggestions from this answer to crop the outputs but my losses are nan and several of accuracies as well.

input1 = keras.Input(shape=(seq_length,),name='input_1')
input2 = keras.Input(shape=(seq_length,),name='input_2')
input3 = keras.Input(shape=(seq_length,),name='input_3')
input1_emb = layers.Embedding(70,32,input_length=seq_length,mask_zero=True)(input1)
input2_emb = layers.Embedding(462,192,input_length=seq_length,mask_zero=True)(input2)
input3_emb = layers.Embedding(84,36,input_length=seq_length,mask_zero=True)(input3)
merged = layers.Concatenate()([input1_emb, input2_emb,input3_emb])
activ_func = 'tanh'
encoded = layers.LSTM(120,activation=activ_func,input_shape=(seq_length,),return_sequences=True)(merged) #
encoded = layers.LSTM(60,activation=activ_func,return_sequences=True)(encoded)
encoded = layers.LSTM(15,activation=activ_func)(encoded)

# Decoder reconstruct inputs
decoded1 = layers.RepeatVector(seq_length)(encoded)
decoded1 = layers.LSTM(60, activation= activ_func , return_sequences=True)(decoded1)
decoded1 = layers.LSTM(120, activation= activ_func , return_sequences=True,name='decoder1_last')(decoded1)

解码器一的输出形状为 (None, 15, 120).

Decoder one has an output shape of (None, 15, 120).

input_copy_1 = layers.TimeDistributed(layers.Dense(70, activation='softmax'))(decoded1)
input_copy_2 = layers.TimeDistributed(layers.Dense(462, activation='softmax'))(decoded1)
input_copy_3 = layers.TimeDistributed(layers.Dense(84, activation='softmax'))(decoded1)

对于每个输出,我试图按照 这个答案.padding0,其中缺少实际输入(由于填充而为零)和 1 否则

For each output, I am trying to crop the O padded timesteps as suggested by this answer. padding has 0 where actual input was missing (had zero due to padding) and 1 otherwise

@tf.function
def cropOutputs(x):
    #x[0] is softmax of respective feature (time distributed) on top of decoder
    #x[1] is the actual input feature
    padding =  tf.cast( tf.not_equal(x[1][1],0), dtype=tf.keras.backend.floatx())
    print(padding)
    return x[0]*tf.tile(tf.expand_dims(padding, axis=-1),tf.constant([1,x[0].shape[2]], tf.int32))

将裁剪功能应用于所有三个输出.

Applying crop function to all three outputs.

input_copy_1 = layers.Lambda(cropOutputs, name='input_copy_1', output_shape=(None, 15, 70))([input_copy_1,input1])
input_copy_2 = layers.Lambda(cropOutputs, name='input_copy_2', output_shape=(None, 15, 462))([input_copy_2,input2])
input_copy_3 = layers.Lambda(cropOutputs, name='input_copy_3', output_shape=(None, 15, 84))([input_copy_3,input3])

我的逻辑是裁剪每个特征的时间步长(序列的所有 3 个特征都具有相同的长度,这意味着它们一起错过了时间步长).但是对于时间步长,它们已根据其特征大小 (70,462,84) 应用了 softmax,因此我必须通过在掩码 的帮助下制作一个等于此特征大小的零或一的多维掩码数组来将时间步长归零padding,并使用多维掩码数组乘以相应的 softmax 表示.

My logic is to crop timesteps of each feature (all 3 features for sequence have the same length, meaning they miss timesteps together). But for timestep, they have been applied softmax as per their feature size (70,462,84) so I have to zero out timestep by making a multi-dimensional mask array of zeros or ones equal to this feature size with help of mask padding, and multiply by respective softmax representation using this using multi-dimensional mask array.

我不确定我这样做是否正确,因为我有这些输入的 Nan 损失以及我正在与此任务共同学习的其他准确性(仅在此裁剪时发生东西).

I am not sure I am doing this right or not as I have Nan losses for these inputs as well as other accuracies have that I am learning jointly with this task (it happens only with this cropping thing).

推荐答案

如果对某人有帮助,我最终会直接从损失中裁剪填充条目(从 这些答案).

If it helps someone, I end up cropping the padded entries from the loss directly (taking some keras code pointer from these answers).

@tf.function
def masked_cc_loss(y_true, y_pred):

        mask = tf.keras.backend.all(tf.equal(y_true, masked_val_hotencoded), axis=-1)
        mask = 1 - tf.cast(mask, tf.keras.backend.floatx())    
 
        loss = tf.keras.losses.CategoricalCrossentropy()(y_true, y_pred) * mask 
        
        return tf.keras.backend.sum(loss) / tf.keras.backend.sum(mask) #  averaging by the number of unmasked entries

这篇关于如何使用 LSTM 自动编码器在多特征序列中在解码时正确忽略填充或丢失的时间步长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆