二进制交叉熵损失计算中np.dot和np.np与np.sum的差 [英] Difference between np.dot and np.multiply with np.sum in binary cross-entropy loss calculation
问题描述
我尝试了以下代码,但是没有找到 np.dot 和 np.np.np.sum
I have tried the following code but didn't find the difference between np.dot and np.multiply with np.sum
这是 np.dot 代码
logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
print(logprobs.shape)
print(logprobs)
cost = (-1/m) * logprobs
print(cost.shape)
print(type(cost))
print(cost)
其输出为
(1, 1)
[[-2.07917628]]
(1, 1)
<class 'numpy.ndarray'>
[[ 0.693058761039 ]]
这是 np.np.sum与np.sum
logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
print(logprobs.shape)
print(logprobs)
cost = - logprobs / m
print(cost.shape)
print(type(cost))
print(cost)
其输出为
()
-2.07917628312
()
<class 'numpy.float64'>
0.693058761039
我无法理解类型和形状的差异,而两种情况下的结果值都是相同的
I'm unable to understand the type and shape difference whereas the result value is same in both cases
即使在压缩以前的代码的情况下,成本值与后来的相同,但类型保持不变
Even in the case of squeezing former code cost value become same as later but type remains same
cost = np.squeeze(cost)
print(type(cost))
print(cost)
输出为
<class 'numpy.ndarray'>
0.6930587610394646
推荐答案
您正在执行的操作是计算 二进制交叉熵损失 ,它测量模型的预测(此处为A2
)与真实输出(此处为Y
)相比有多糟糕
What you're doing is calculating the binary cross-entropy loss which measures how bad the predictions (here: A2
) of the model are when compared to the true outputs (here: Y
).
以下是您的案例的可复制示例,该示例应说明为什么在第二种情况下使用np.sum
Here is a reproducible example for your case, which should explain why you get a scalar in the second case using np.sum
In [88]: Y = np.array([[1, 0, 1, 1, 0, 1, 0, 0]])
In [89]: A2 = np.array([[0.8, 0.2, 0.95, 0.92, 0.01, 0.93, 0.1, 0.02]])
In [90]: logprobs = np.dot(Y, (np.log(A2)).T) + np.dot((1.0-Y),(np.log(1 - A2)).T)
# `np.dot` returns 2D array since its arguments are 2D arrays
In [91]: logprobs
Out[91]: array([[-0.78914626]])
In [92]: cost = (-1/m) * logprobs
In [93]: cost
Out[93]: array([[ 0.09864328]])
In [94]: logprobs = np.sum(np.multiply(np.log(A2), Y) + np.multiply((1 - Y), np.log(1 - A2)))
# np.sum returns scalar since it sums everything in the 2D array
In [95]: logprobs
Out[95]: -0.78914625761870361
请注意, np.dot
仅沿求和的内部尺寸求和,此处与(1x8) and (8x1)
相匹配.因此,8
将在点积或矩阵乘法期间消失,其结果为(1x1)
,它只是一个标量,但返回为形状为(1,1)
的2D数组.
Note that the np.dot
sums along only the inner dimensions which match here (1x8) and (8x1)
. So, the 8
s will be gone during the dot product or matrix multiplication yielding the result as (1x1)
which is just a scalar but returned as 2D array of shape (1,1)
.
最重要的是,请注意此处 np.dot
与np.matmul
完全相同,因为输入是2D数组(即矩阵)
Also, most importantly note that here np.dot
is exactly same as doing np.matmul
since the inputs are 2D arrays (i.e. matrices)
In [107]: logprobs = np.matmul(Y, (np.log(A2)).T) + np.matmul((1.0-Y),(np.log(1 - A2)).T)
In [108]: logprobs
Out[108]: array([[-0.78914626]])
In [109]: logprobs.shape
Out[109]: (1, 1)
以标量值的形式返回结果
np.dot
或 np.matmul
根据输入数组返回结果数组的形状.如果输入是2D数组,即使使用out=
自变量,也不可能返回标量.但是,我们可以使用 np.asscalar()
如果结果数组的形状为(1,1)
(或更一般而言,包装在nD数组中的 scalar 值),则将结果转换为标量
Return result as a scalar value
np.dot
or np.matmul
returns whatever the resulting array shape would be, based on input arrays. Even with out=
argument it's not possible to return a scalar, if the inputs are 2D arrays. However, we can use np.asscalar()
on the result to convert it to a scalar if the result array is of shape (1,1)
(or more generally a scalar value wrapped in an nD array)
In [123]: np.asscalar(logprobs)
Out[123]: -0.7891462576187036
In [124]: type(np.asscalar(logprobs))
Out[124]: float
ndarray 的大小为1至标量值
In [127]: np.asscalar(np.array([[[23.2]]]))
Out[127]: 23.2
In [128]: np.asscalar(np.array([[[[23.2]]]]))
Out[128]: 23.2
这篇关于二进制交叉熵损失计算中np.dot和np.np与np.sum的差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!