切片稀疏(scipy)矩阵 [英] slicing sparse (scipy) matrix
问题描述
在理解从scipy.sparse包中切片lil_matrix(A)时的以下行为,我将不胜感激.
I would appreciate any help, to understand following behavior when slicing a lil_matrix (A) from the scipy.sparse package.
实际上,我想基于行和列的任意索引列表提取一个子矩阵.
Actually, I would like to extract a submatrix based on an arbitrary index list for both rows and columns.
当我使用这两行代码时:
When I used this two lines of code:
x1 = A[list 1,:]
x2 = x1[:,list 2]
一切都很好,我可以提取正确的子矩阵.
Everything was fine and I could extract the right submatrix.
当我尝试在一行中执行此操作时,它失败了(返回的矩阵为空)
When I tried to do this in one line, it failed (The returning matrix was empty)
x=A[list 1,list 2]
为什么会这样?总的来说,我在matlab中使用了类似的命令,并且可以正常工作. 那么,为什么不使用第一个,因为它可以工作呢?这似乎很耗时.由于我必须处理大量条目,因此我想使用一个命令来加快它的速度.也许我使用了错误的稀疏矩阵类型...知道吗?
Why is this so? Overall, I have used a similar command in matlab and there it works. So, why not use the first, since it works? It seems to be quite time consuming. Since I have to go through a large amount of entries, I would like to speed it up using a single command. Maybe I use the wrong sparse matrix type...Any idea?
推荐答案
您正在使用的方法
A[list1, :][:, list2]
似乎是从备用矩阵中选择所需值的最快方法.请参阅下面的基准.
seems to be the fastest way to select the desired values from a spares matrix. See below for a benchmark.
但是,要回答有关如何从具有单个索引的A
的任意行和列中选择值的问题,
您将需要使用所谓的高级索引" :
However, to answer your question about how to select values from arbitrary rows and columns of A
with a single index,
you would need to use so-called "advanced indexing":
A[np.array(list1)[:,np.newaxis], np.array(list2)]
使用高级索引,如果arr1
和arr2
是NDarrays,则A[arr1, arr2]
的(i,j)
分量等于
With advanced indexing, if arr1
and arr2
are NDarrays, the (i,j)
component of A[arr1, arr2]
equals
A[arr1[i,j], arr2[i,j]]
因此,您希望所有j
的arr1[i,j]
等于list1[i]
,并且
对于所有i
,arr2[i,j]
等于list2[j]
.
Thus you would want arr1[i,j]
to equal list1[i]
for all j
, and
arr2[i,j]
to equal list2[j]
for all i
.
可以通过设置广播(请参见下文)来进行安排(见下文)
arr1 = np.array(list1)[:,np.newaxis]
和arr2 = np.array(list2)
.
That can be arranged with the help of broadcasting (see below) by setting
arr1 = np.array(list1)[:,np.newaxis]
, and arr2 = np.array(list2)
.
arr1
的形状为(len(list1), 1)
,而arr2
的形状为
(len(list2), )
广播到(1, len(list2))
,因为添加了新轴
会在需要时自动显示在左侧.
The shape of arr1
is (len(list1), 1)
while the shape of arr2
is
(len(list2), )
which broadcasts to (1, len(list2))
since new axes are added
on the left automatically when needed.
每个数组可以进一步广播以塑造(len(list1),len(list2))
形状.
这正是我们想要的
A[arr1[i,j],arr2[i,j]]
有意义,因为我们希望(i,j)
遍历形状为(len(list1),len(list2))
的结果数组的所有可能的索引.
Each array can be further broadcasted to shape (len(list1),len(list2))
.
This is exactly what we want for
A[arr1[i,j],arr2[i,j]]
to make sense, since we want (i,j)
to run over all possible indices for a result array of shape (len(list1),len(list2))
.
这是一个测试案例的微基准,表明A[list1, :][:, list2]
是最快的选择:
Here is a microbenchmark for one test case which suggests that A[list1, :][:, list2]
is the fastest option:
In [32]: %timeit orig(A, list1, list2)
10 loops, best of 3: 110 ms per loop
In [34]: %timeit using_listener(A, list1, list2)
1 loop, best of 3: 1.29 s per loop
In [33]: %timeit using_advanced_indexing(A, list1, list2)
1 loop, best of 3: 1.8 s per loop
这是我用于基准测试的设置:
Here is the setup I used for the benchmark:
import numpy as np
import scipy.sparse as sparse
import random
random.seed(1)
def setup(N):
A = sparse.rand(N, N, .1, format='lil')
list1 = np.random.choice(N, size=N//10, replace=False).tolist()
list2 = np.random.choice(N, size=N//20, replace=False).tolist()
return A, list1, list2
def orig(A, list1, list2):
return A[list1, :][:, list2]
def using_advanced_indexing(A, list1, list2):
B = A.tocsc() # or `.tocsr()`
B = B[np.array(list1)[:, np.newaxis], np.array(list2)]
return B
def using_listener(A, list1, list2):
"""https://stackoverflow.com/a/26592783/190597 (listener)"""
B = A.tocsr()[list1, :].tocsc()[:, list2]
return B
N = 10000
A, list1, list2 = setup(N)
B = orig(A, list1, list2)
C = using_advanced_indexing(A, list1, list2)
D = using_listener(A, list1, list2)
assert np.allclose(B.toarray(), C.toarray())
assert np.allclose(B.toarray(), D.toarray())
这篇关于切片稀疏(scipy)矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!