为什么要使用softmax而不是标准归一化? [英] Why use softmax as opposed to standard normalization?

查看:791
本文介绍了为什么要使用softmax而不是标准归一化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在神经网络的输出层中,通常使用softmax函数来近似概率分布:

In the output layer of a neural network, it is typical to use the softmax function to approximate a probability distribution:

由于指数的原因,这计算起来很昂贵.为什么不简单地执行Z变换以使所有输出为正,然后仅通过将所有输出除以所有输出的总和来进行归一化?

This is expensive to compute because of the exponents. Why not simply perform a Z transform so that all outputs are positive, and then normalise just by dividing all outputs by the sum of all outputs?

推荐答案

与标准规范化相比,Softmax有一个不错的属性.

There is one nice attribute of Softmax as compared with standard normalisation.

它对神经网络的低刺激(思维模糊的图像)具有相当均匀的分布做出反应,而对高刺激(即大量的数字,认为图像清晰)的概率接近0和1.

It react to low stimulation (think blurry image) of your neural net with rather uniform distribution and to high stimulation (ie. large numbers, think crisp image) with probabilities close to 0 and 1.

只要比例相同,标准标准化就无关紧要.

While standard normalisation does not care as long as the proportion are the same.

看看soft max的输入大10倍时会发生什么,即您的神经网络获得清晰的图像并且许多神经元被激活了

Have a look what happens when soft max has 10 times larger input, ie your neural net got a crisp image and a lot of neurones got activated

>>> softmax([1,2])              # blurry image of a ferret
[0.26894142,      0.73105858])  #     it is a cat perhaps !?
>>> softmax([10,20])            # crisp image of a cat
[0.0000453978687, 0.999954602]) #     it is definitely a CAT !

然后将其与标准归一化进行比较

And then compare it with standard normalisation

>>> std_norm([1,2])                      # blurry image of a ferret
[0.3333333333333333, 0.6666666666666666] #     it is a cat perhaps !?
>>> std_norm([10,20])                    # crisp image of a cat
[0.3333333333333333, 0.6666666666666666] #     it is a cat perhaps !?

这篇关于为什么要使用softmax而不是标准归一化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆