np.dot 和 np.multiply 与 np.sum 在二进制交叉熵损失计算中的区别 [英] Difference between np.dot and np.multiply with np.sum in binary cross-entropy loss calculation

查看:30
本文介绍了np.dot 和 np.multiply 与 np.sum 在二进制交叉熵损失计算中的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试了以下代码,但没有发现 np.dotnp.multiply 与 np.sum 之间的区别

这里是 np.dot 代码

logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)打印(logprobs.shape)打印(日志问题)成本 = (-1/m) * logprobs打印(成本.形状)打印(类型(成本))打印(成本)

它的输出是

(1, 1)[[-2.07917628]](1, 1)<类'numpy.ndarray'>[[ 0.693058761039 ]]

这是 np.multiply 与 np.sum 的代码

logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))打印(logprobs.shape)打印(日志问题)成本 = - logprobs/m打印(成本.形状)打印(类型(成本))打印(成本)

它的输出是

<代码>()-2.07917628312()<类'numpy.float64'>0.693058761039

我无法理解类型和形状的差异,而两种情况下的结果值相同

即使在压缩前代码的情况下成本值与后相同但类型保持相同

cost = np.squeeze(cost)打印(类型(成本))打印(成本)

输出是

<class 'numpy.ndarray'>0.6930587610394646

解决方案

你正在做的是计算 二元交叉熵损失,用于衡量模型的预测(此处为:A2)与真实输出(这里:Y).

这是您的案例的可重现示例,它应该解释为什么您在第二种情况下使用 np.sum

得到一个标量

在[88]中:Y = np.array([[1, 0, 1, 1, 0, 1, 0, 0]])在 [89] 中:A2 = np.array([[0.8, 0.2, 0.95, 0.92, 0.01, 0.93, 0.1, 0.02]])在 [90] 中:logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)# `np.dot` 返回二维数组,因为它的参数是二维数组在 [91] 中:logprobs出[91]:数组([[-0.78914626]])在 [92] 中:成本 = (-1/m) * logprobs在 [93] 中:成本出[93]:数组([[ 0.09864328]])在 [94] 中:logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))# np.sum 返回标量,因为它对 2D 数组中的所有内容求和在 [95] 中:logprobs输出[95]:-0.78914625761870361

请注意,np.dot 仅对与此处 (1x8) 和 (8x1) 匹配的内部尺寸求和.因此,8s 将在点积或矩阵乘法期间消失,产生的结果为 (1x1),这只是一个 标量,但返回作为形状 (1,1).

的二维数组

另外,最重要的是注意这里 np.dotnp.matmul 完全相同,因为输入是二维数组(即矩阵)

在[107]中:logprobs = np.matmul(Y, (np.log(A2)).T) + np.matmul((1.0-Y),(np.log(1 - A2)).T)在 [108] 中:logprobs出[108]:数组([[-0.78914626]])在 [109] 中:logprobs.shape出 [109]: (1, 1)


标量值的形式返回结果

np.dotnp.matmul 根据输入数组返回任何结果数组形状.如果输入是二维数组,即使使用 out= 参数也无法返回 标量.但是,我们可以使用 np.asscalar() 如果结果数组的形状为 (1,1) (或更一般地说是 scalar 包裹在 nD 数组中的值)

在 [123]: np.asscalar(logprobs)输出[123]:-0.7891462576187036在 [124] 中:类型(np.asscalar(logprobs))出[124]:浮动


<块引用>

ndarray 大小为 1 到 标量

在 [127]: np.asscalar(np.array([[[23.2]]]))出局[127]:23.2在 [128] 中:np.asscalar(np.array([[[[23.2]]]]))出局[128]:23.2

I have tried the following code but didn't find the difference between np.dot and np.multiply with np.sum

Here is np.dot code

logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
print(logprobs.shape)
print(logprobs)
cost = (-1/m) * logprobs
print(cost.shape)
print(type(cost))
print(cost)

Its output is

(1, 1)
[[-2.07917628]]
(1, 1)
<class 'numpy.ndarray'>
[[ 0.693058761039 ]]

Here is the code for np.multiply with np.sum

logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
print(logprobs.shape)         
print(logprobs)
cost = - logprobs / m
print(cost.shape)
print(type(cost))
print(cost)

Its output is

()
-2.07917628312
()
<class 'numpy.float64'>
0.693058761039

I'm unable to understand the type and shape difference whereas the result value is same in both cases

Even in the case of squeezing former code cost value become same as later but type remains same

cost = np.squeeze(cost)
print(type(cost))
print(cost)

output is

<class 'numpy.ndarray'>
0.6930587610394646

解决方案

What you're doing is calculating the binary cross-entropy loss which measures how bad the predictions (here: A2) of the model are when compared to the true outputs (here: Y).

Here is a reproducible example for your case, which should explain why you get a scalar in the second case using np.sum

In [88]: Y = np.array([[1, 0, 1, 1, 0, 1, 0, 0]])

In [89]: A2 = np.array([[0.8, 0.2, 0.95, 0.92, 0.01, 0.93, 0.1, 0.02]])

In [90]: logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)

# `np.dot` returns 2D array since its arguments are 2D arrays
In [91]: logprobs
Out[91]: array([[-0.78914626]])

In [92]: cost = (-1/m) * logprobs

In [93]: cost
Out[93]: array([[ 0.09864328]])

In [94]: logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))

# np.sum returns scalar since it sums everything in the 2D array
In [95]: logprobs
Out[95]: -0.78914625761870361

Note that the np.dot sums along only the inner dimensions which match here (1x8) and (8x1). So, the 8s will be gone during the dot product or matrix multiplication yielding the result as (1x1) which is just a scalar but returned as 2D array of shape (1,1).


Also, most importantly note that here np.dot is exactly same as doing np.matmul since the inputs are 2D arrays (i.e. matrices)

In [107]: logprobs = np.matmul(Y, (np.log(A2)).T) + np.matmul((1.0-Y),(np.log(1 - A2)).T)

In [108]: logprobs
Out[108]: array([[-0.78914626]])

In [109]: logprobs.shape
Out[109]: (1, 1)


Return result as a scalar value

np.dot or np.matmul returns whatever the resulting array shape would be, based on input arrays. Even with out= argument it's not possible to return a scalar, if the inputs are 2D arrays. However, we can use np.asscalar() on the result to convert it to a scalar if the result array is of shape (1,1) (or more generally a scalar value wrapped in an nD array)

In [123]: np.asscalar(logprobs)
Out[123]: -0.7891462576187036

In [124]: type(np.asscalar(logprobs))
Out[124]: float


ndarray of size 1 to scalar value

In [127]: np.asscalar(np.array([[[23.2]]]))
Out[127]: 23.2

In [128]: np.asscalar(np.array([[[[23.2]]]]))
Out[128]: 23.2

这篇关于np.dot 和 np.multiply 与 np.sum 在二进制交叉熵损失计算中的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆