在python中查找特征值/向量的最快方法是什么? [英] whats the fastest way to find eigenvalues/vectors in python?

查看:380
本文介绍了在python中查找特征值/向量的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我正在使用numpy来完成这项工作.但是,由于我正在处理具有数千行/列的矩阵,后来这个数字将上升到成千上万,我想知道是否存在一个可以更快地执行这种计算的程序包?

解决方案

  • **如果矩阵是稀疏的,则使用 scipy.sparse 中的构造函数实例化矩阵,然后使用 spicy.sparse.linalg .从性能的角度来看,这有两个优点:

    • 您的矩阵是根据Spicer.sparse构造函数构建的,它的稀疏度会成比例地减小.

    • eigs , eigsh )的"noreferrer>特征值/特征向量方法接受可选参数 k 要返回的特征向量/特征值对的数量.几乎总是需要大于99%的方差的数目远远少于可以验证事后的列数;换句话说,您可以告诉方法不要计算并返回所有特征向量/特征值对-除了考虑方差所需的(通常)小的子集之外,不太可能需要其余部分.

  • 使用 SciPy scipy.linalg 中的线性代数库 NumPy 库的相同名称.这两个库有 相同的名称,并使用相同的方法名称.然而,性能有所不同. 这种差异是由于 numpy.linalg 是 对类似LAPACK例程的 less 忠实包装器, 为了便携性和便利性而牺牲一些性能(即 遵守整个 NumPy 库的 NumPy 设计目标 应该在没有Fortran编译器的情况下构建). SciPy 中的 linalg 另一方面是对LAPACK的更完整的包装,其中 使用 f2py .

  • 选择适合您的用例的功能 ;换句话说,不要使用函数做超出您需要的事情.在 scipy.linalg 中 有几个函数可以计算特征值;这 尽管通过仔细选择功能,差异并不大 计算特征值,您应该会看到性能提升.为了 实例:

    • scipy.linalg.eig 返回两者的特征值和 特征向量
    • scipy.linalg.eigvals ,仅返回特征值.因此,如果只需要矩阵的特征值,则不要使用 linalg.eig ,请改用 linalg.eigvals .
    • 如果您有实值平方对称矩阵(等于其转置矩阵),请使用 scipy.linalg.eigsh
  • 优化您的Scipy构建 基本上是在SciPy的 setup.py 脚本中完成的.也许 性能上最重要的选择是确定任何优化的 LAPACK库,例如 ATLAS 或Accelerate/vecLib框架(OS X 仅?),以便SciPy可以检测到它们并针对它们构建. 根据您当前的装备,优化您的SciPy 构建然后重新安装可以为您带来实质性的性能 增加. 此处.

这些函数对大型矩阵有用吗?

我应该这样认为.这些是工业强度矩阵分解方法,仅是类似Fortran LAPACK 例程的薄包装.

我已经使用了linalg库中的大多数方法来分解矩阵,其中的列数通常在5到50之间,并且行数通常超过500,000. SVD 特征值方法似乎都没有问题,无法处理这种大小的矩阵.

使用 SciPy linalg ,您可以通过一次调用使用该库 eig中的几种方法中的任何一种来计算特征向量和特征值 eigvalsh eigh .

>>> import numpy as NP
>>> from scipy import linalg as LA

>>> A = NP.random.randint(0, 10, 25).reshape(5, 5)
>>> A
    array([[9, 5, 4, 3, 7],
           [3, 3, 2, 9, 7],
           [6, 5, 3, 4, 0],
           [7, 3, 5, 5, 5],
           [2, 5, 4, 7, 8]])

>>> e_vals, e_vecs = LA.eig(A)

Currently im using numpy which does the job. But, as i'm dealing with matrices with several thousands of rows/columns and later this figure will go up to tens of thousands, i was wondering if there was a package in existence that can perform this kind of calculations faster ?

解决方案

  • **if your matrix is sparse, then instantiate your matrix using a constructor from scipy.sparse then use the analogous eigenvector/eigenvalue methods in spicy.sparse.linalg. From a performance point of view, this has two advantages:

    • your matrix, built from the spicy.sparse constructor, will be smaller in proportion to how sparse it is.

    • the eigenvalue/eigenvector methods for sparse matrices (eigs, eigsh) accept an optional argument, k which is the number of eigenvector/eigenvalue pairs you want returned. Nearly always the number required to account for the >99% of the variance is far less then the number of columns, which you can verify ex post; in other words, you can tell method not to calculate and return all of the eigenvectors/eigenvalue pairs--beyond the (usually) small subset required to account for the variance, it's unlikely you need the rest.

  • use the linear algebra library in SciPy, scipy.linalg, instead of the NumPy library of the same name. These two libraries have the same name and use the same method names. Yet there's a difference in performance. This difference is caused by the fact that numpy.linalg is a less faithful wrapper on the analogous LAPACK routines which sacrifice some performance for portability and convenience (i.e., to comply with the NumPy design goal that the entire NumPy library should be built without a Fortran compiler). linalg in SciPy on the other hand is a much more complete wrapper on LAPACK and which uses f2py.

  • select the function appropriate for your use case; in other words, don't use a function does more than you need. In scipy.linalg there are several functions to calculate eigenvalues; the differences are not large, though by careful choice of the function to calculate eigenvalues, you should see a performance boost. For instance:

    • scipy.linalg.eig returns both the eigenvalues and eigenvectors
    • scipy.linalg.eigvals, returns only the eigenvalues. So if you only need the eigenvalues of a matrix then do not use linalg.eig, use linalg.eigvals instead.
    • if you have a real-valued square symmetric matrices (equal to its transpose) then use scipy.linalg.eigsh
  • optimize your Scipy build Preparing your SciPy build environement is done largely in SciPy's setup.py script. Perhaps the most significant option performance-wise is identifying any optimized LAPACK libraries such as ATLAS or Accelerate/vecLib framework (OS X only?) so that SciPy can detect them and build against them. Depending on the rig you have at the moment, optimizing your SciPy build then re-installing can give you a substantial performance increase. Additional notes from the SciPy core team are here.

Will these functions work for large matrices?

I should think so. These are industrial strength matrix decomposition methods, and which are just thin wrappers over the analogous Fortran LAPACK routines.

I have used most of the methods in the linalg library to decompose matrices in which the number of columns is usually between about 5 and 50, and in which the number of rows usually exceeds 500,000. Neither the SVD nor the eigenvalue methods seem to have any problem handling matrices of this size.

Using the SciPy library linalg you can calculate eigenvectors and eigenvalues, with a single call, using any of several methods from this library, eig, eigvalsh, and eigh.

>>> import numpy as NP
>>> from scipy import linalg as LA

>>> A = NP.random.randint(0, 10, 25).reshape(5, 5)
>>> A
    array([[9, 5, 4, 3, 7],
           [3, 3, 2, 9, 7],
           [6, 5, 3, 4, 0],
           [7, 3, 5, 5, 5],
           [2, 5, 4, 7, 8]])

>>> e_vals, e_vecs = LA.eig(A)

这篇关于在python中查找特征值/向量的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆