切片稀疏(scipy)矩阵 [英] slicing sparse (scipy) matrix

查看：246 发布时间：2020/7/10 3:05:04 python scipy slice sparse-matrix submatrix

本文介绍了切片稀疏(scipy)矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在理解从scipy.sparse包中切片lil_matrix(A)时的以下行为，我将不胜感激.

I would appreciate any help, to understand following behavior when slicing a lil_matrix (A) from the scipy.sparse package.

实际上，我想基于行和列的任意索引列表提取一个子矩阵.

Actually, I would like to extract a submatrix based on an arbitrary index list for both rows and columns.

当我使用这两行代码时:

When I used this two lines of code:

x1 = A[list 1,:]
x2 = x1[:,list 2]

一切都很好，我可以提取正确的子矩阵.

Everything was fine and I could extract the right submatrix.

当我尝试在一行中执行此操作时，它失败了(返回的矩阵为空)

When I tried to do this in one line, it failed (The returning matrix was empty)

x=A[list 1,list 2]

为什么会这样?总的来说，我在matlab中使用了类似的命令，并且可以正常工作. 那么，为什么不使用第一个，因为它可以工作呢?这似乎很耗时.由于我必须处理大量条目，因此我想使用一个命令来加快它的速度.也许我使用了错误的稀疏矩阵类型...知道吗?

Why is this so? Overall, I have used a similar command in matlab and there it works. So, why not use the first, since it works? It seems to be quite time consuming. Since I have to go through a large amount of entries, I would like to speed it up using a single command. Maybe I use the wrong sparse matrix type...Any idea?

推荐答案

您正在使用的方法

A[list1, :][:, list2]

似乎是从备用矩阵中选择所需值的最快方法.请参阅下面的基准.

seems to be the fastest way to select the desired values from a spares matrix. See below for a benchmark.

但是，要回答有关如何从具有单个索引的A 的任意行和列中选择值的问题，您将需要使用所谓的高级索引" :

However, to answer your question about how to select values from arbitrary rows and columns of A with a single index, you would need to use so-called "advanced indexing":

A[np.array(list1)[:,np.newaxis], np.array(list2)]

使用高级索引，如果arr1和arr2是NDarrays，则A[arr1, arr2]的(i,j)分量等于

With advanced indexing, if arr1 and arr2 are NDarrays, the (i,j) component of A[arr1, arr2] equals

A[arr1[i,j], arr2[i,j]]

因此，您希望所有j的arr1[i,j]等于list1[i]，并且对于所有i，arr2[i,j]等于list2[j].

Thus you would want arr1[i,j] to equal list1[i] for all j, and arr2[i,j] to equal list2[j] for all i.

可以通过设置广播(请参见下文)来进行安排(见下文) arr1 = np.array(list1)[:,np.newaxis]和arr2 = np.array(list2).

That can be arranged with the help of broadcasting (see below) by setting arr1 = np.array(list1)[:,np.newaxis], and arr2 = np.array(list2).

arr1的形状为(len(list1), 1)，而arr2的形状为 (len(list2), )广播到(1, len(list2))，因为添加了新轴会在需要时自动显示在左侧.

The shape of arr1 is (len(list1), 1) while the shape of arr2 is (len(list2), ) which broadcasts to (1, len(list2)) since new axes are added on the left automatically when needed.

每个数组可以进一步广播以塑造(len(list1),len(list2))形状. 这正是我们想要的 A[arr1[i,j],arr2[i,j]]有意义，因为我们希望(i,j)遍历形状为(len(list1),len(list2))的结果数组的所有可能的索引.

Each array can be further broadcasted to shape (len(list1),len(list2)). This is exactly what we want for A[arr1[i,j],arr2[i,j]] to make sense, since we want (i,j) to run over all possible indices for a result array of shape (len(list1),len(list2)).

这是一个测试案例的微基准，表明A[list1, :][:, list2]是最快的选择:

Here is a microbenchmark for one test case which suggests that A[list1, :][:, list2] is the fastest option:

In [32]: %timeit orig(A, list1, list2) 10 loops, best of 3: 110 ms per loop In [34]: %timeit using_listener(A, list1, list2) 1 loop, best of 3: 1.29 s per loop In [33]: %timeit using_advanced_indexing(A, list1, list2) 1 loop, best of 3: 1.8 s per loop

这是我用于基准测试的设置:

Here is the setup I used for the benchmark:

import numpy as np import scipy.sparse as sparse import random random.seed(1) def setup(N): A = sparse.rand(N, N, .1, format='lil') list1 = np.random.choice(N, size=N//10, replace=False).tolist() list2 = np.random.choice(N, size=N//20, replace=False).tolist() return A, list1, list2 def orig(A, list1, list2): return A[list1, :][:, list2] def using_advanced_indexing(A, list1, list2): B = A.tocsc() # or `.tocsr()` B = B[np.array(list1)[:, np.newaxis], np.array(list2)] return B def using_listener(A, list1, list2): """https://stackoverflow.com/a/26592783/190597 (listener)""" B = A.tocsr()[list1, :].tocsc()[:, list2] return B N = 10000 A, list1, list2 = setup(N) B = orig(A, list1, list2) C = using_advanced_indexing(A, list1, list2) D = using_listener(A, list1, list2) assert np.allclose(B.toarray(), C.toarray()) assert np.allclose(B.toarray(), D.toarray())

这篇关于切片稀疏(scipy)矩阵的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

切片稀疏(scipy)矩阵 [英] slicing sparse (scipy) matrix

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

切片稀疏(scipy)矩阵 [英] slicing sparse (scipy) matrix

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭