numpy.dot()函数运行缓慢的原因以及使用自定义类时如何缓解它们? [英] Reasons of slowness in numpy.dot() function and how to mitigate them if custom classes are used?

查看:104
本文介绍了numpy.dot()函数运行缓慢的原因以及使用自定义类时如何缓解它们?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在分析一个numpy点产品调用.

I am profiling a numpy dot product call.

numpy.dot(pseudo,pseudo)

pseudo是自定义对象的numpy数组.定义为:

pseudo is a numpy array of custom objects. Defined as:

pseudo = numpy.array(
         [[PseudoBinary(1), PseudoBinary(0), PseudoBinary(1)],
          [PseudoBinary(1), PseudoBinary(0), PseudoBinary(0)],
          [PseudoBinary(1), PseudoBinary(0), PseudoBinary(1)]])

PseudoBinary是具有自定义乘法功能的类.它或而不是相乘.请参阅下面的PseudoBinary定义的完整代码.

PseudoBinary is a class that has a custom multiply function. It ORs instead of multiplying. See below for the complete code of PseudoBinary definition.

类型:

(Pdb) pseudo.dtype
dtype('O')

根据我的分析,伪点积比使用具有整数值的矩阵的点积慢约500倍.下面提供了概要分析代码的指针.

According to my profiling, the pseudo dot product is about 500 times slower than a dot product using matrixes with integer values. Pointer to the profiling code is given below.

我对这种缓慢的原因以及是否有缓解它们的方法感兴趣.

I am interested in the reasons of the slowness and if there are ways to mitigate them.

缓慢的某些原因可能是:

Some of the reasons of the slowness may be:

  • pseudo的内存布局将不使用连续内存.根据,numpy使用具有对象类型的指针.在矩阵乘法期间,可能发生一堆指针取消引用,而不是直接从连续内存中读取.

  • The memory layout of pseudo would not use contiguous memory. According to this, numpy uses pointers with object types. During matrix multiplication, bunch of pointer dereferences may occur instead of directly reading from contiguous memory.

Numpy乘法可能不使用优化的内部编译实现. (BLAS,ATLAS等).根据,应满足各种条件才能退回到优化的实现.使用自定义对象可能会破坏这些对象.

Numpy multiplication may not use the optimized internal compiled implementations. (BLAS, ATLAS etc.) According to this, various conditions should hold for falling back to the optimized implementation. Using custom objects may break those.

还有其他因素在起作用吗?有什么改进建议吗?

Are there other factors in play? Any recommendations for improvement?

所有这一切的起点是这个问题.在那里,OP正在寻找定制点产品".访问两个矩阵元素的操作类似于点积操作,但是除了将相应的列和行元素相乘之外,还执行其他操作.在 answer 中,我推荐了一个覆盖__mul__函数的自定义对象.但是使用这种方法时,numpy.dot的性能非常慢.进行性能测量的代码也可以在该答案中看到.

The starting point of all this was this question. There, the OP is looking for a "custom dot product". An operation that visits the elements of two matrices similar to the dot product operation, but does something else than multiplying the corresponding elements of columns and rows. In an answer, I recommended a custom object that overwrites the __mul__ function. But the numpy.dot performance is very slow with that approach. The code that does the performance measurement can be seen in that answer too.

显示PseudoBinary类和点积执行的代码.

Code showing the PseudoBinary class and dot product execution.

#!/usr/bin/env python


 from __future__ import absolute_import
 from __future__ import print_function
 import numpy

 class PseudoBinary(object):
     def __init__(self,i):
         self.i = i

     def __mul__(self,rhs):
         return PseudoBinary(self.i or rhs.i)

     __rmul__ = __mul__
     __imul__ = __mul__

     def __add__(self,rhs):
         return PseudoBinary(self.i + rhs.i)

     __radd__ = __add__
     __iadd__ = __add__

     def __str__(self):
         return "P"+str(self.i)

     __repr__ = __str__

 base = numpy.array(
       [[1, 0, 1],
        [1, 0, 0],
        [1, 0, 1]])

 pseudo = numpy.array(
          [[PseudoBinary(1), PseudoBinary(0), PseudoBinary(1)],
           [PseudoBinary(1), PseudoBinary(0), PseudoBinary(0)],
           [PseudoBinary(1), PseudoBinary(0), PseudoBinary(1)]])

 baseRes = numpy.dot(base,base)
 pseudoRes = numpy.dot(pseudo,pseudo)

 print("baseRes\n",baseRes)
 print("pseudoRes\n",pseudoRes)

打印:

baseRes
 [[2 0 2]
 [1 0 1]
 [2 0 2]]
pseudoRes
 [[P3 P2 P2]
 [P3 P1 P2]
 [P3 P2 P2]]

推荐答案

使用对象数组进行的任何操作都会很慢. NumPy通常可以快速应用于对象数组的任何原因.

Pretty much anything you do with object arrays is going to be slow. None of the reasons NumPy is usually fast apply to object arrays.

  • 对象数组不能连续存储其元素.它们必须存储和取消引用指针.
    • 他们不知道必须为元素分配多少空间.
    • 它们的元素大小可能不尽相同.
    • 您插入到对象数组中的元素已经在数组外部分配,因此无法复制.
    • Object arrays cannot store their elements contiguously. They must store and dereference pointers.
      • They don't know how much space they would have to allocate for their elements.
      • Their elements may not all be the same size.
      • The elements you insert into an object array have already been allocated outside the array, and they cannot be copied.

      基本上,在没有NumPy的情况下进行Python数学运算的所有缓慢原因也都适用于对对象数组进行任何操作.

      Basically, every reason doing Python math without NumPy is slow also applies to doing anything with object arrays.

      至于如何改善您的表现呢?不要使用对象数组.使用常规数组,或者根据NumPy提供的操作找到一种方法来实现所需的东西,或者显式地写出循环并使用Numba或Cython之类的代码来编译代码.

      As for how to improve your performance? Don't use object arrays. Use regular arrays, and either find a way to implement the thing you want in terms of the operations NumPy provides, or write out the loops explicitly and use something like Numba or Cython to compile your code.

      这篇关于numpy.dot()函数运行缓慢的原因以及使用自定义类时如何缓解它们?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆