如何修改填充向量的seq2seq成本函数? [英] How to modify the seq2seq cost function for padded vectors?
问题描述
但是,如何在 https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/seq2seq.py#L890 被修改为遇到被屏蔽的序列,因此成本和困惑只能根据实际序列而不是整个填充序列进行计算?
def sequence_loss_by_example(logits,targets,weights,average_across_timesteps = True,softmax_loss_function = None,name = None):
如果len(targets)!= len(logits)或len(weights)!= len(logits):
raise ValueError(对数,重量和目标的长度必须相同
%d,%d,%d。%(len(logits),len(weights),len(targets)))
with ops.op_scope(logits + targets + weights,name,
$ b log_perp_list = []
用于logit,目标,zip中的重量(logits,targets,weights):
如果softmax_loss_function为无:
#TODO (irving,ebrevdo):需要重新形成,因为
#sequence_loss_by_example有时被称为标量,
#违反了我们的一般标量严格性策略。
target = array_ops.reshape(target,[-1])$ b $ b crossent = nn_ops.sparse_softmax_cross_entropy_with_logits(
logit,target)
else:
crossent = softmax_loss_function(logit ,target)
log_perp_list.append(crossent * weight)
log_perps = math_ops.add_n(log_perp_list)
如果average_across_timesteps:
total_size = math_ops.add_n(weights)
total_size + = 1e-12#为了避免所有0的权重除以0。
log_perps / = total_size
return log_perps
此功能已经支持通过使用权重来计算动态序列长度的成本。只要您确保填充目标的权重为0,则这些步骤的交叉熵将被推到0:
log_perp_list.append(crossent * weight)
,总大小也将仅反映非填充步骤:
total_size = math_ops.add_n(weights)
如果您使用零填充,则一种导出权重的方法如下:
weights = tf.sign(tf.abs(model.targets))
(请注意,您可能需要将其转换为与目标相同的类型)
Tensorflow supports dynamic length sequence by use of the parameter: 'sequence_length' while constructing the RNN layer, wherein the model does not learn the sequence after the sequence size = 'sequence_length' i.e, returns zero vector.
However, how can the cost function at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/seq2seq.py#L890 be modified to encounter the masked sequences, so that cost and perplexity are calculated only on the actual sequences rather than whole padded sequence?
def sequence_loss_by_example(logits, targets, weights, average_across_timesteps=True, softmax_loss_function=None, name=None):
if len(targets) != len(logits) or len(weights) != len(logits):
raise ValueError("Lengths of logits, weights, and targets must be the same "
"%d, %d, %d." % (len(logits), len(weights), len(targets)))
with ops.op_scope(logits + targets + weights, name,
"sequence_loss_by_example"):
log_perp_list = []
for logit, target, weight in zip(logits, targets, weights):
if softmax_loss_function is None:
# TODO(irving,ebrevdo): This reshape is needed because
# sequence_loss_by_example is called with scalars sometimes, which
# violates our general scalar strictness policy.
target = array_ops.reshape(target, [-1])
crossent = nn_ops.sparse_softmax_cross_entropy_with_logits(
logit, target)
else:
crossent = softmax_loss_function(logit, target)
log_perp_list.append(crossent * weight)
log_perps = math_ops.add_n(log_perp_list)
if average_across_timesteps:
total_size = math_ops.add_n(weights)
total_size += 1e-12 # Just to avoid division by 0 for all-0 weights.
log_perps /= total_size
return log_perps
This function already supports calculating costs for dynamic sequence lengths through the use of weights. As long as you ensure the weights are 0 for the "padding targets", the cross entropy will be pushed to 0 for those steps:
log_perp_list.append(crossent * weight)
and the total size will also reflect only the non-padding steps:
total_size = math_ops.add_n(weights)
If you're padding with zeros, one way to derive the weights is as follows:
weights = tf.sign(tf.abs(model.targets))
(Note that you might need to cast this to the same type as your targets)
这篇关于如何修改填充向量的seq2seq成本函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!