为什么np.dot不精确? (n维数组) [英] Why is np.dot imprecise? (n-dim arrays)

查看:105
本文介绍了为什么np.dot不精确? (n维数组)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们采用两个'float32' 2D数组中的np.dot:

 res = np.dot(a, b)   # see CASE 1
print(list(res[0]))  # list shows more digits
 

 [-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
 

数字.除了它们可以更改:


案例1 :切片a

 np.random.seed(1)
a = np.random.randn(9, 6).astype('float32')
b = np.random.randn(6, 6).astype('float32')

for i in range(1, len(a)):
    print(list(np.dot(a[:i], b)[0])) # full shape: (i, 6)
 

[-0.9044868,  -1.1708502, 0.90713596, 3.5594249, 1.1374012, -1.3826287]
[-0.90448684, -1.1708503, 0.9071359,  3.5594249, 1.1374011, -1.3826288]
[-0.90448684, -1.1708503, 0.9071359,  3.5594249, 1.1374011, -1.3826288]
[-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]

结果有所不同,即使打印的切片是从完全相同的数字相乘得出的.


案例2 :展平a,获取一维版本的b,然后然后切片a:

 np.random.seed(1)
a = np.random.randn(9, 6).astype('float32')
b = np.random.randn(1, 6).astype('float32')

for i in range(1, len(a)):
    a_flat = np.expand_dims(a[:i].flatten(), -1) # keep 2D
    print(list(np.dot(a_flat, b)[0])) # full shape: (i*6, 6)
 

 [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
 


案例3 :更强的控制力;将所有不涉及的整数设置为 zero :将a[1:] = 0添加到CASE 1代码中.结果:差异仍然存在.


案例4 :检查除[0]以外的索引;就像[0]一样,结果从创建时就开始稳定固定数量的数组扩展. 输出

 np.random.seed(1)
a = np.random.randn(9, 6).astype('float32')
b = np.random.randn(6, 6).astype('float32')

for j in range(len(a) - 2):
    for i in range(1, len(a)):
        res = np.dot(a[:i], b)
        try:    print(list(res[j]))
        except: pass
    print()
 


因此,对于2D * 2D情况,结果有所不同-但对于1D * 1D却是一致的.从我的一些读物来看,这似乎源于1D-1D使用简单加法,而2D-2D使用了``更高级''的性能提升加法,但精度可能较低(例如成对加法则相反).但是,我不明白为什么一旦将a切过一个设定的阈值",情况1中的差异就消失了;为什么? ab越大,该阈值似乎越晚,但始终存在.

所有人都说:为什么np.dot对于ND-ND阵列不精确(且不一致)? 相关的Git


其他信息:

  • 环境:Win-10操作系统,Python 3.7.4,Spyder 3.3.6 IDE,Anaconda 3.0 2019/10
  • CPU : MKL -还有BLASS图书馆;感谢 Bi Rico


    压力测试代码:如前所述,采用较大阵列时,频率差异会加剧;如果上面无法再现,则应该下面(如果不是,请尝试更大的暗色). 我的输出

     np.random.seed(1)
    a = (0.01*np.random.randn(9, 9999)).astype('float32') # first multiply then type-cast
    b = (0.01*np.random.randn(9999, 6)).astype('float32') # *0.01 to bound mults to < 1
    
    for i in range(1, len(a)):
        print(list(np.dot(a[:i], b)[0]))
     


    问题严重程度:显示的差异很小",但在神经网络上运行时不再如此,在数秒内数十亿个数字相乘,而在整个运行过程中则有数万亿个.每个此线程所报告的模型准确性相差整整百分之十.

    下面是一个数组的gif,它是通过向模型中馈入基本上是a[0],w/len(a)==1len(a)==32的结果生成的:


    其他平台的结果,并感谢 Paul 的测试:

    案例1(部分)转载:

    • Google Colab VM-英特尔至强2.3 G-Hz-Jupyter-Python 3.6.8
    • Win-10 Pro Docker桌面-英特尔i7-8700K-jupyter/scipy-notebook-Python 3.7.3
    • Ubuntu 18.04.2 LTS + Docker-AMD FX-8150-jupyter/scipy-notebook-Python 3.7.3

    注意:这些错误产生的错误比上面显示的要低得多;第一行中的两个条目与其他行中的相应条目的最低有效位相差1.

    未复制案例1 :

    • Ubuntu 18.04.3 LTS-英特尔i7-8700K-IPython 5.5.0-Python 2.7.15+和3.6.8(2个测试)
    • Ubuntu 18.04.3 LTS-英特尔i5-3320M-IPython 5.5.0-Python 2.7.15 +
    • Ubuntu 18.04.2 LTS-AMD FX-8150-IPython 5.5.0-Python 2.7.15rc1

    注释:

    • 链接 Colab笔记本和Jupyter环境的差异要小得多(并且仅前两行)比在我的系统上观察到的多.而且,案例2从未(还)没有表现出不精确性.
    • 在这个非常有限的示例中,当前(Dockerized)的Jupyter环境比IPython环境更容易受到攻击.
    • np.show_config()太长了,无法发布,但总而言之:IPython环境是基于BLAS/LAPACK的; Colab基于OpenBLAS.在IPython Linux环境中,BLAS库是系统安装的-在Jupyter和Colab中,它们来自/opt/conda/lib

    更新:被接受的答案是准确的,但范围广泛且不完整.对于任何可以在代码级别解释行为的人来说,这个问题仍然悬而未决-即np.dot使用的精确算法,以及它如何解释上述结果中观察到的一致不一致"(另请参见注释) ).这是我无法理解的一些直接实现: sdot.c - arraytypes.c.src

    解决方案

    这似乎是不可避免的数值不精确性.如此处所述, NumPy使用高度优化的,经过精心调整的BLAS方法进行矩阵乘法 .这意味着可能要乘以2个矩阵的运算顺序(总和与乘积)随矩阵大小的变化而变化.

    为了更清楚一点,我们知道,在数学上,可以将所得矩阵的每个元素计算为两个 vector (等长序列为数字).但这不是不是的方式,NumPy如何计算所得矩阵的元素.实际上,还有更高效但复杂的算法,例如 Strassen算法,无需计算即可获得相同的结果直接行列点积.

    使用此类算法时,即使所得矩阵 C = AB 的元素 C ij 在数学上定义为 A i 行与 B j-th 列的点积,如果您将矩阵 A2 A 行与第 th 行相同,再乘以矩阵 B2 B 相同的 j-th 列,元素 C2 ij 实际上将按照不同的操作顺序进行计算(取决于整个 A2 B2 矩阵),可能导致不同的数值误差.

    这就是为什么即使在数学上 C ij = C2 ij (例如在案例1中),计算中算法遵循的不同操作顺序(由于矩阵大小的变化)导致不同的数值误差.数值误差还解释了根据环境和在某些情况下(某些情况下可能不存在数值误差)这一事实而略有不同的结果.

    Suppose we take np.dot of two 'float32' 2D arrays:

    res = np.dot(a, b)   # see CASE 1
    print(list(res[0]))  # list shows more digits
    

    [-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
    

    Numbers. Except, they can change:


    CASE 1: slice a

    np.random.seed(1)
    a = np.random.randn(9, 6).astype('float32')
    b = np.random.randn(6, 6).astype('float32')
    
    for i in range(1, len(a)):
        print(list(np.dot(a[:i], b)[0])) # full shape: (i, 6)
    

    [-0.9044868,  -1.1708502, 0.90713596, 3.5594249, 1.1374012, -1.3826287]
    [-0.90448684, -1.1708503, 0.9071359,  3.5594249, 1.1374011, -1.3826288]
    [-0.90448684, -1.1708503, 0.9071359,  3.5594249, 1.1374011, -1.3826288]
    [-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
    [-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
    [-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
    [-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
    [-0.90448684, -1.1708503, 0.907136,   3.5594249, 1.1374011, -1.3826287]
    

    Results differ, even though the printed slice derives from the exact same numbers multiplied.


    CASE 2: flatten a, take a 1D version of b, then slice a:

    np.random.seed(1)
    a = np.random.randn(9, 6).astype('float32')
    b = np.random.randn(1, 6).astype('float32')
    
    for i in range(1, len(a)):
        a_flat = np.expand_dims(a[:i].flatten(), -1) # keep 2D
        print(list(np.dot(a_flat, b)[0])) # full shape: (i*6, 6)
    

    [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
    [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
    [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
    [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
    [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
    [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
    [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
    [-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
    


    CASE 3: stronger control; set all non-involved entires to zero: add a[1:] = 0 to CASE 1 code. Result: discrepancies persist.


    CASE 4: check indices other than [0]; like for [0], results begin to stabilize a fixed # of array enlargements from their point of creation. Output

    np.random.seed(1)
    a = np.random.randn(9, 6).astype('float32')
    b = np.random.randn(6, 6).astype('float32')
    
    for j in range(len(a) - 2):
        for i in range(1, len(a)):
            res = np.dot(a[:i], b)
            try:    print(list(res[j]))
            except: pass
        print()
    


    Hence, for the 2D * 2D case, results differ - but are consistent for 1D * 1D. From some of my readings, this appears to stem from 1D-1D using simple addition, whereas 2D-2D uses 'fancier', performance-boosting addition that may be less precise (e.g. pairwise addition does the opposite). Nonetheless, I'm unable to understand why discrepancies vanish in case 1 once a is sliced past a set 'threshold'; the larger a and b, the later this threshold seems to lie, but it always exists.

    All said: why is np.dot imprecise (and inconsistent) for ND-ND arrays? Relevant Git


    Additional info:

    • Environment: Win-10 OS, Python 3.7.4, Spyder 3.3.6 IDE, Anaconda 3.0 2019/10
    • CPU: i7-7700HQ 2.8 GHz
    • Numpy v1.16.5

    Possible culprit library: Numpy MKL - also BLASS libraries; thanks to Bi Rico for noting


    Stress-test code: as noted, discrepancies exacerbate in frequency w/ larger arrays; if above isn't reproducible, below should be (if not, try larger dims). My output

    np.random.seed(1)
    a = (0.01*np.random.randn(9, 9999)).astype('float32') # first multiply then type-cast
    b = (0.01*np.random.randn(9999, 6)).astype('float32') # *0.01 to bound mults to < 1
    
    for i in range(1, len(a)):
        print(list(np.dot(a[:i], b)[0]))
    


    Problem severity: shown discrepancies are 'small', but no longer so when operating on a neural network with billions of numbers multiplied over a few seconds, and trillions over the entire runtime; reported model accuracy differs by entire 10's of percents, per this thread.

    Below is a gif of arrays resulting from feeding to a model what's basically a[0], w/ len(a)==1 vs. len(a)==32:


    OTHER PLATFORMS results, according and with thanks to Paul's testing:

    Case 1 reproduced (partly):

    • Google Colab VM -- Intel Xeon 2.3 G-Hz -- Jupyter -- Python 3.6.8
    • Win-10 Pro Docker Desktop -- Intel i7-8700K -- jupyter/scipy-notebook -- Python 3.7.3
    • Ubuntu 18.04.2 LTS + Docker -- AMD FX-8150 -- jupyter/scipy-notebook -- Python 3.7.3

    Note: these yield much lower error than shown above; two entries on the first row are off by 1 in the least significant digit from corresponding entries in the other rows.

    Case 1 not reproduced:

    • Ubuntu 18.04.3 LTS -- Intel i7-8700K -- IPython 5.5.0 -- Python 2.7.15+ and 3.6.8 (2 tests)
    • Ubuntu 18.04.3 LTS -- Intel i5-3320M -- IPython 5.5.0 -- Python 2.7.15+
    • Ubuntu 18.04.2 LTS -- AMD FX-8150 -- IPython 5.5.0 -- Python 2.7.15rc1

    Notes:

    • The linked Colab notebook and jupyter environments show a far lesser discrepancy (and only for first two rows) than is observed on my system. Also, Case 2 never (yet) showed imprecision.
    • Within this very limited sample, the current (Dockerized) Jupyter environment is more susceptible than the IPython environment.
    • np.show_config() too long to post, but in summary: IPython envs are BLAS/LAPACK-based; Colab is OpenBLAS-based. In IPython Linux envs, BLAS libraries are system-installed -- in Jupyter and Colab, they come from /opt/conda/lib

    UPDATE: the accepted answer is accurate, but broad and incomplete. The question remains open for anyone who can explain the behavior at the code level - namely, an exact algorithm used by np.dot, and how it explains 'consistent inconsistencies' observed in above results (also see comments). Here are some direct implementations beyond my deciphering: sdot.c -- arraytypes.c.src

    解决方案

    This looks like unavoidable numeric imprecision. As explained here, NumPy uses a highly-optimized, carefully-tuned BLAS method for matrix multiplication. This means that probably the sequence of operations (sum and products) followed to multiply 2 matrices, changes when the size of the matrix changes.

    Trying to be clearer, we know that, mathematically, each element of the resulting matrix can be calculated as the dot product of two vectors (equal-length sequences of numbers). But this is not how NumPy calculates an element of the resulting matrix. Infact there are more efficient but complex algorithms, like the Strassen algorithm, that obtain the same result without computing directly the row-column dot product .

    When using such algorithms, even if the element C ij of a resulting matrix C = A B is mathematically defined as the dot product of the i-th row of A with the j-th column of B, if you multiply a matrix A2 having the same i-th row as A with a matrix B2 having the same j-th column as B, the element C2 ij will be actually computed following a different sequence of operations (that depends on the whole A2 and B2 matrices), possibly leading to different numerical errors.

    That's why, even if mathematically C ij = C2 ij (like in your CASE 1), the different sequence of operations followed by the algorithm in the calculations (due to change in matrix size) leads to different numerical errors. The numerical error explains also the slightly different results depending on the environment and the fact that, in some cases, for some environments, the numerical error might be absent.

    这篇关于为什么np.dot不精确? (n维数组)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆