为什么要使用 softmax 而不是标准归一化? [英] Why use softmax as opposed to standard normalization?

查看:63
本文介绍了为什么要使用 softmax 而不是标准归一化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在神经网络的输出层,通常使用softmax函数来近似概率分布:

由于指数的原因,计算起来很昂贵.为什么不简单地执行 Z 变换使所有输出都是正的,然后通过将所有输出除以所有输出的总和来归一化?

解决方案

与标准归一化相比,Softmax 有一个很好的属性.

它以相当均匀的分布对神经网络的低刺激(认为是模糊的图像)做出反应,对概率接近 0 和 1 的高刺激(即大量,认为是清晰的图像)做出反应.

虽然标准归一化无关紧要,只要比例相同即可.

看看当 soft max 的输入大 10 倍时会发生什么,即你的神经网络得到清晰的图像并且很多神经元被激活

<预><代码>>>>softmax([1,2]) # 雪貂的模糊图像[0.26894142, 0.73105858]) # 它可能是一只猫!?>>>softmax([10,20]) # 猫的清晰图像[0.0000453978687, 0.999954602]) # 这绝对是一只猫!

然后将其与标准归一化进行比较

<预><代码>>>>std_norm([1,2]) # 雪貂的模糊图像[0.3333333333333333, 0.6666666666666666] # 也许是一只猫!?>>>std_norm([10,20]) # 猫的清晰图像[0.3333333333333333, 0.6666666666666666] # 也许是一只猫!?

In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution:

This is expensive to compute because of the exponents. Why not simply perform a Z transform so that all outputs are positive, and then normalise just by dividing all outputs by the sum of all outputs?

解决方案

There is one nice attribute of Softmax as compared with standard normalisation.

It react to low stimulation (think blurry image) of your neural net with rather uniform distribution and to high stimulation (ie. large numbers, think crisp image) with probabilities close to 0 and 1.

While standard normalisation does not care as long as the proportion are the same.

Have a look what happens when soft max has 10 times larger input, ie your neural net got a crisp image and a lot of neurones got activated

>>> softmax([1,2])              # blurry image of a ferret
[0.26894142,      0.73105858])  #     it is a cat perhaps !?
>>> softmax([10,20])            # crisp image of a cat
[0.0000453978687, 0.999954602]) #     it is definitely a CAT !

And then compare it with standard normalisation

>>> std_norm([1,2])                      # blurry image of a ferret
[0.3333333333333333, 0.6666666666666666] #     it is a cat perhaps !?
>>> std_norm([10,20])                    # crisp image of a cat
[0.3333333333333333, 0.6666666666666666] #     it is a cat perhaps !?

这篇关于为什么要使用 softmax 而不是标准归一化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆