是否有用于非标准代数类对象的稀疏矩阵运算的python库? [英] Is there a python library for sparse matrix operations for non-standard algebra-like objects?

查看:59
本文介绍了是否有用于非标准代数类对象的稀疏矩阵运算的python库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

摘要:我正在寻找一种方法来处理稀疏矩阵,其非零项不是通常的整数/浮点数等,而是代数的元素,即a的实例具有加,乘和零元素的非标准python类.

Summary: I am looking for a way to do computations with sparse matrices whose non-zero entries are not the usual integers/floats/etc., but elements of an algebra, ie instances of a non-standard python class with addition, multiplication and a zero element.

它适用于密集矩阵.我通过定义python类algebra并重载加法和乘法来实现此代数:

It works fine for dense matrices. I have implemented this algebra by defining a python class algebra and overloading addition and multiplication:

class algebra(object):
   ...
   __mul__(self,other):
      ...
   __add__(self,other):
      ...

numpy允许我定义其条目为类algebra的实例的向量和矩阵.它还使我能够执行所有常规操作,例如矩阵乘法/加法/张量点/切片/等,因此,它们的工作方式就如同整数/浮点数等上的矩阵一样.

numpy allows me to define vectors and matrices whose entries are instances of the class algebra. It also allows me to perform all the usual operations like matrix multiplication/addition/tensordot/slicing/etc., so it is all working just as for matrices over integers/floats/etc.

它不适用于稀疏矩阵. 为了加快计算速度,我现在想将这些密集矩阵替换为稀疏矩阵.我尝试使用SciPy的2-D稀疏矩阵程序包scipy.sparse来完成这项工作,但是到目前为止,我还是失败了.我可以用代数元素填充这些稀疏矩阵类的实例,但是每当我使用它们进行计算时,都会收到类似

It does not work for sparse matrices. To speed up computations, I would now like to replace these dense matrices by sparse ones. I have tried to make this work with SciPy's 2-D sparse matrix package scipy.sparse, but I have failed so far. I can populate instances of these sparse matrix classes by my algebra elements, but whenever I do computations with them, I get an error message like

TypeError: no supported conversion for types: (dtype('O'),dtype('O'))

对我来说,这表明scipy.sparse支持的对象类型受到限制.我看不到任何数学原因,为什么稀疏矩阵的运算应关注对象类型.只要该类具有浮点数的所有操作,它就应该起作用.我想念什么?除了支持任意对象类型的scipy.sparse之外,还有其他替代方法吗?

To me, this suggests that there is a restriction on the type of objects that are supported by scipy.sparse. I do not see any mathematical reason for why the operations for sparse matrices should care about the object type. As long as the class has all the operations of floats, say, it should work. What am I missing? Is there an alternative to scipy.sparse which supports arbitrary object types?

下面是一个最小工作示例.请注意,我已经用通常的整数0实现了代数的零元素.还请注意,我感兴趣的实际代数比实数更复杂!

Below is a minimal working example. Note that I have implemented the zero element of the algebra in terms of the usual integer 0. Please also note that the actual algebra I am interested in is more complicated than the real integers!

import numpy as np
from scipy.sparse import csr_matrix

class algebra(object): # the algebra of the real integers

    def __init__(self,num):
        self.num = num

    def __add__(self,other):
        if isinstance(other, self.__class__):
            return algebra(self.num+other.num)
        else:
            return self

    def __radd__(self,other):
        if isinstance(other, self.__class__):
            return algebra(self.num+other.num)
        else:
            return self

    def __mul__(self,other):
        if isinstance(other, self.__class__):
            return algebra(self.num*other.num)
        else:
            return 0

    def __rmul__(self,other):
        if isinstance(other, self.__class__):
            return algebra(self.num*other.num)
        else:
            return 0

    def __repr__(self):
        return "algebra:"+str(self.num)  

a=algebra(5)
print(a*a)
print(a*0)
print(0*a)
indptr = np.array([0, 2, 3, 6])
indices = np.array([0, 2, 2, 0, 1, 2])
data = np.array([a,a,a,a,a,a])
S = csr_matrix((data, indices, indptr), shape=(3, 3))
print(S)
print("Everything works fine up to here.")
S*S    

输出为:

algebra:25
0
0
  (0, 0)    algebra:5
  (0, 2)    algebra:5
  (1, 2)    algebra:5
  (2, 0)    algebra:5
  (2, 1)    algebra:5
  (2, 2)    algebra:5
Everything works fine up to here.
Traceback (most recent call last):
  File "test", line 46, in <module>
    S*S    
  File "/usr/lib/python3/dist-packages/scipy/sparse/base.py", line 319, in __mul__
    return self._mul_sparse_matrix(other)
  File "/usr/lib/python3/dist-packages/scipy/sparse/compressed.py", line 499, in _mul_sparse_matrix
    data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
  File "/usr/lib/python3/dist-packages/scipy/sparse/sputils.py", line 57, in upcast
    raise TypeError('no supported conversion for types: %r' % (args,))
TypeError: no supported conversion for types: (dtype('O'), dtype('O'))

我正在Linux上使用Python 3.5.2.

I am using Python 3.5.2 on linux.

推荐答案

这可能会更多地归入注释类别,但作为答案,我可以将其加长,并进行更多编辑.

This may fall more in the comment category, but as an answer I can make it longer, and edit it more.

numpy数组通过将指向对象的指针/引用存储在数组的数据缓冲区中来实现对象dtype.通过将任务委托给对象方法来完成数学.迭代实质上以Python的速度进行,与列表理解相当(甚至可能慢一点). numpy不在这些对象上进行快速编译.

numpy arrays implement object dtype by storing pointers/references to the objects in the array's data buffer. Math is done by delegating the task to object methods. The iteration is essentially at Python speeds, comparable to list comprehension (may be even a bit slower). numpy does not do its fast compiled math on these objects.

scipy.sparse尚未开发这种功能. coo格式矩阵可以使用对象输入来创建-但这是因为它做的并不多.实际上,如果datarowcol输入具有正确的numpy数组设置,则它们将用作coo属性,而无需更改.

scipy.sparse has not developed this kind of functionality. A coo format matrix can probably be created with the object inputs - but that's because it doesn't do much. In fact if the data, row and col inputs have the right numpy array setup, they are uses as coo attributes without change.

显然,像使用indptr等一样制作csr,也只是分配了属性.从coocsr的转换可能无法很好地进行,因为这涉及到重复项的求和.

Apparently making csr as you do with the indptr etc also just assigns the attributes. A coo to csr conversion might not work so well, since that involves summation of duplicates.

在任何情况下,csr数学代码都使用python和c(cython)的混合体,并且已编译的部分只能使用有限数量的数字类型-长整数和双精度整数以及浮点数.我认为它甚至不适用于短整数(int8int16).它没有实现ndarrays委托的任何对象dtype.

In any case csr math code uses a mix of python and c (cython), and the compiled part works with a limited number of numeric types - long and double integers and floats. I don't think it even works for short ints (int8, int16). It does not implement any of the object dtype delegating that ndarrays do.

与您的S:

In [187]: S.A                                                                                                
...
ValueError: unsupported data types in input

In [188]: S.tocoo()                                                                                          
Out[188]: 
<3x3 sparse matrix of type '<class 'numpy.object_'>'
    with 6 stored elements in COOrdinate format>

tocoo不需要任何值更改.但是回到csr要求对重复项求和:

no value changes are required for tocoo. But back to csr requires summing duplicates:

In [189]: S.tocoo().tocsr()                                                                                  
 ...
TypeError: no supported conversion for types: (dtype('O'),)

In [190]: S.tolil()                                                                                          
/usr/local/lib/python3.6/dist-packages/scipy/sparse/sputils.py:115: UserWarning: object dtype is not supported by sparse matrices
  warnings.warn("object dtype is not supported by sparse matrices")
Out[190]: 
<3x3 sparse matrix of type '<class 'numpy.object_'>'
    with 6 stored elements in LInked List format>

存储该对象数据没有问题

There's no problem in storing this object data

具有对象列表和数组的数学运算-相似的时间:

Math with a list of your objects versus an array - similar times:

In [192]: alist = [a]*100                                                                                    
In [193]: arr = np.array(alist)                                                                              
In [194]: timeit [i*j for i,j in zip(alist,alist)]                                                           
77.9 µs ± 272 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [195]: timeit arr*arr                                                                                     
75.1 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

关于稀疏矩阵中使用int16的一个较早的问题,您可能已经看到(我刚刚对此表示赞同).相同的基本问题:

An earlier question, which you may have already seen (I just got an upvote), about using int16 in sparse matrices. Same basic issue:

为什么我不能在第一个"try:"中将数据分配给稀疏矩阵的一部分吗?

符号库具有一个稀疏矩阵模块: https://docs. sympy.org/latest/modules/matrices/sparse.html

The symbolics library has a sparse matrix module: https://docs.sympy.org/latest/modules/matrices/sparse.html

Pandas有自己的稀疏系列/数据框实现

Pandas has its own sparse Series/Dataframe implementations

https://docs.scipy.org/doc/scipy/reference/generation/scipy.sparse.coo_matrix.html#scipy.sparse.coo_matrix

默认情况下,转换为CSR或CSC格式时,重复的(i,j)条目将加在一起.这有助于有限元矩阵等的有效构造. (请参见示例)

By default when converting to CSR or CSC format, duplicate (i,j) entries will be summed together. This facilitates efficient construction of finite element matrices and the like. (see example)

这篇关于是否有用于非标准代数类对象的稀疏矩阵运算的python库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆