将 scipy 稀疏矩阵的几行采样到另一个 [英] Sampling few rows of a scipy sparse matrix into another

查看:43
本文介绍了将 scipy 稀疏矩阵的几行采样到另一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何对 scipy 稀疏矩阵的某些行进行采样,并从这些采样的行中形成新的 scipy 稀疏矩阵?

例如.如果我有一个 10 行的 scipy 稀疏矩阵 A,我想创建一个新的 scipy 稀疏矩阵 B,其中行来自 A 的第 1、3、4 行,该怎么做?

解决方案

左乘合适的指标矩阵.指标矩阵可以使用 scipy.sparse.block_diag 构建,也可以直接使用 csr 格式构建,如下所示.

<预><代码>>>>将 numpy 导入为 np>>>从 scipy 导入稀疏>>># 创建示例>>>米,n = 10, 8>>>子集 = [1,3,4]>>>A = sparse.csr_matrix(np.random.randint(-10, 5, (m, n)).clip(0, None))>>>A.A数组([[3, 2, 4, 0, 0, 0, 2, 0],[0, 0, 2, 0, 0, 0, 0, 0],[4, 0, 0, 0, 0, 2, 0, 0],[0, 0, 0, 0, 0, 0, 4, 0],[3, 0, 0, 0, 1, 4, 0, 0],[0, 0, 0, 0, 0, 0, 2, 0],[0, 0, 0, 4, 0, 4, 4, 0],[0, 2, 0, 0, 0, 3, 0, 0],[4, 0, 3, 3, 0, 0, 0, 2],[4, 0, 0, 0, 0, 2, 0, 1]], dtype=int64)>>># 构建指标矩阵# 要么使用 block_diag ...>>>split_points = np.arange(len(subset)+1).repeat(np.diff(np.concatenate([[0], subset, [m-1]])))>>>指标 = sparse.block_diag(np.split(np.ones(len(subset), int), split_points)).T>>>指标.A数组([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]], dtype=int64)>>># ...或手动---这也适用于未排序的非唯一子集,# 因此优先于 block_diag>>>指标 = sparse.csr_matrix((np.ones(len(subset), int), subset, np.arange(len(subset)+1)), (len(subset), m))>>>指标.A数组([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]])>>># 申请>>>结果 = 指标@A>>>结果.A数组([[0, 0, 2, 0, 0, 0, 0, 0],[0, 0, 0, 0, 0, 0, 4, 0],[3, 0, 0, 0, 1, 4, 0, 0]], dtype=int64)

How can I sample some of the rows of a scipy sparse matrix and form a new scipy sparse matrix from these sampled rows?

For eg. if I have a scipy sparse matrix A with 10 rows and I want to make a new scipy sparse matrix B with rows 1,3,4 from A, how to do that?

解决方案

Left-multiply with an appropriate indicator matrix. The indicator matrix can be built using scipy.sparse.block_diag or directly, using csr format, as shown below.

>>> import numpy as np
>>> from scipy import sparse
>>> 
# create example
>>> m, n = 10, 8
>>> subset = [1,3,4]
>>> A = sparse.csr_matrix(np.random.randint(-10, 5, (m, n)).clip(0, None))
>>> A.A
array([[3, 2, 4, 0, 0, 0, 2, 0],
       [0, 0, 2, 0, 0, 0, 0, 0],
       [4, 0, 0, 0, 0, 2, 0, 0],
       [0, 0, 0, 0, 0, 0, 4, 0],
       [3, 0, 0, 0, 1, 4, 0, 0],
       [0, 0, 0, 0, 0, 0, 2, 0],
       [0, 0, 0, 4, 0, 4, 4, 0],
       [0, 2, 0, 0, 0, 3, 0, 0],
       [4, 0, 3, 3, 0, 0, 0, 2],
       [4, 0, 0, 0, 0, 2, 0, 1]], dtype=int64)
>>>
# build indicator matrix
# either using block_diag ...
>>> split_points = np.arange(len(subset)+1).repeat(np.diff(np.concatenate([[0], subset, [m-1]])))
>>> indicator = sparse.block_diag(np.split(np.ones(len(subset), int), split_points)).T
>>> indicator.A
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0, 0, 0]], dtype=int64)
>>>
# ... or manually---this also works for non sorted non unique subset,
# and is therefore to be preferred over block_diag
>>> indicator = sparse.csr_matrix((np.ones(len(subset), int), subset, np.arange(len(subset)+1)), (len(subset), m))
>>> indicator.A
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0, 0, 0]])
>>> 
# apply
>>> result = indicator@A
>>> result.A
array([[0, 0, 2, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 4, 0],
       [3, 0, 0, 0, 1, 4, 0, 0]], dtype=int64)

这篇关于将 scipy 稀疏矩阵的几行采样到另一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆