通过交换行和列来重新排列稀疏数组 [英] Rearrange sparse arrays by swapping rows and columns

查看:68
本文介绍了通过交换行和列来重新排列稀疏数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大而稀疏的数组,我想通过交换行和列来重新排列它们.在scipy.sparse中执行此操作的好方法是什么?

I have large but sparse arrays and I want to rearrange them by swapping rows an columns. What is a good way to do this in scipy.sparse?

一些问题

  • 我认为置换矩阵不适合此任务,因为它们喜欢随机更改稀疏结构.并且即使只需要进行少量交换,操作也将始终乘以"所有列或行.

  • I don't think that permutation matrices are well suited for this task, as they like randomly change the sparsity structure. And a manipulation will always 'multiply' all columns or rows, even if there are only a few swaps necessary.

在此任务中,scipy.sparse中最佳的稀疏矩阵表示是什么?

What is the best sparse matrix representation in scipy.sparse for this task?

非常欢迎您提出实施建议.

Suggestions for implementation are very welcome.

我也用Matlab对此进行了标记,因为这个问题可能找到的答案不一定是scipy特定的.

I have tagged this with Matlab as well, since this question might find an answer that is not necessarily scipy specific.

推荐答案

CSC格式保留所有非零条目的行索引的列表,CSR格式保留所有非零条目的列索引的列表.我认为您可以利用它来进行如下交换操作,并且我认为它不应该有任何副作用:

CSC format keeps a list of the row indices of all non-zero entries, CSR format keeps a list of the column indices of all non-zero entries. I think you can take advantage of that to swap things around as follows, and I think there shouldn't be any side-effects to it:

def swap_rows(mat, a, b) :
    mat_csc = scipy.sparse.csc_matrix(mat)
    a_idx = np.where(mat_csc.indices == a)
    b_idx = np.where(mat_csc.indices == b)
    mat_csc.indices[a_idx] = b
    mat_csc.indices[b_idx] = a
    return mat_csc.asformat(mat.format)

def swap_cols(mat, a, b) :
    mat_csr = scipy.sparse.csr_matrix(mat)
    a_idx = np.where(mat_csr.indices == a)
    b_idx = np.where(mat_csr.indices == b)
    mat_csr.indices[a_idx] = b
    mat_csr.indices[b_idx] = a
    return mat_csr.asformat(mat.format)

您现在可以执行以下操作:

You could now do something like this:

>>> mat = np.zeros((5,5))
>>> mat[[1, 2, 3, 3], [0, 2, 2, 4]] = 1
>>> mat = scipy.sparse.lil_matrix(mat)
>>> mat.todense()
matrix([[ 0.,  0.,  0.,  0.,  0.],
        [ 1.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  1.],
        [ 0.,  0.,  0.,  0.,  0.]])
>>> swap_rows(mat, 1, 3)
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 4 stored elements in LInked List format>
>>> swap_rows(mat, 1, 3).todense()
matrix([[ 0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  1.,  0.,  1.],
        [ 0.,  0.,  1.,  0.,  0.],
        [ 1.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.]])
>>> swap_cols(mat, 0, 4)
<5x5 sparse matrix of type '<type 'numpy.float64'>'
    with 4 stored elements in LInked List format>
>>> swap_cols(mat, 0, 4).todense()
matrix([[ 0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  1.],
        [ 0.,  0.,  1.,  0.,  0.],
        [ 1.,  0.,  1.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.]])

我已经使用LIL矩阵来说明如何保留输出的类型.在您的应用程序中,您可能希望已经采用CSC或CSR格式,并选择是否首先基于它交换行或列,以最大程度地减少转换.

I have used a LIL matrix to show how you could preserve the type of your output. In your application you probably want to already be in CSC or CSR format, and select whether to swap rows or columns first based on it, to minimize conversions.

这篇关于通过交换行和列来重新排列稀疏数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆