遍历 scipy.sparse 向量(或矩阵) [英] Iterating through a scipy.sparse vector (or matrix)

查看:30
本文介绍了遍历 scipy.sparse 向量(或矩阵)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道用 scipy.sparse 迭代稀疏矩阵的非零条目的最佳方法是什么.例如,如果我执行以下操作:

from scipy.sparse import lil_matrixx = lil_matrix( (20,1) )x[13,0] = 1x[15,0] = 2c = 0对于 x 中的 i:打印 c, ic = c+1

输出是

<预><代码>012345678910111213 (0, 0) 1.01415 (0, 0) 2.016171819

所以看起来迭代器正在接触每个元素,而不仅仅是非零条目.我看过API

http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.lil_matrix.html

并搜索了一下,但我似乎找不到有效的解决方案.

解决方案

bbtrb 的方法(使用 coo_matrix) 比我最初的建议快得多,使用 非零.Sven Marnach 建议使用 itertools.izip 也提高了速度.目前最快的是using_tocoo_izip:

import scipy.sparse随机导入导入迭代工具def using_nonzero(x):行,列 = x.nonzero()对于行,列在 zip(行,列):((row,col), x[row,col])def using_coo(x):cx = scipy.sparse.coo_matrix(x)对于 zip(cx.row, cx.col, cx.data) 中的 i,j,v:(i,j,v)def using_tocoo(x):cx = x.tocoo()对于 zip(cx.row, cx.col, cx.data) 中的 i,j,v:(i,j,v)def using_tocoo_izip(x):cx = x.tocoo()对于 itertools.izip(cx.row, cx.col, cx.data) 中的 i,j,v:(i,j,v)N=200x = scipy.sparse.lil_matrix( (N,N) )对于 xrange(N) 中的 _:x[random.randint(0,N-1),random.randint(0,N-1)]=random.randint(1,100)

产生这些 timeit 结果:

% python -mtimeit -s'import test' 'test.using_tocoo_izip(test.x)'1000 个循环,最好的 3 个:每个循环 670 微秒% python -mtimeit -s'import test' 'test.using_tocoo(test.x)'1000 个循环,最好的 3 个:每个循环 706 微秒% python -mtimeit -s'import test' 'test.using_coo(test.x)'1000 个循环,最好的 3 个:每个循环 802 微秒% python -mtimeit -s'import test' 'test.using_nonzero(test.x)'100 个循环,最好的 3 个:每个循环 5.25 毫秒

I'm wondering what the best way is to iterate nonzero entries of sparse matrices with scipy.sparse. For example, if I do the following:

from scipy.sparse import lil_matrix

x = lil_matrix( (20,1) )
x[13,0] = 1
x[15,0] = 2

c = 0
for i in x:
  print c, i
  c = c+1

the output is

0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13   (0, 0) 1.0
14 
15   (0, 0) 2.0
16 
17 
18 
19  

so it appears the iterator is touching every element, not just the nonzero entries. I've had a look at the API

http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.lil_matrix.html

and searched around a bit, but I can't seem to find a solution that works.

解决方案

Edit: bbtrb's method (using coo_matrix) is much faster than my original suggestion, using nonzero. Sven Marnach's suggestion to use itertools.izip also improves the speed. Current fastest is using_tocoo_izip:

import scipy.sparse
import random
import itertools

def using_nonzero(x):
    rows,cols = x.nonzero()
    for row,col in zip(rows,cols):
        ((row,col), x[row,col])

def using_coo(x):
    cx = scipy.sparse.coo_matrix(x)    
    for i,j,v in zip(cx.row, cx.col, cx.data):
        (i,j,v)

def using_tocoo(x):
    cx = x.tocoo()    
    for i,j,v in zip(cx.row, cx.col, cx.data):
        (i,j,v)

def using_tocoo_izip(x):
    cx = x.tocoo()    
    for i,j,v in itertools.izip(cx.row, cx.col, cx.data):
        (i,j,v)

N=200
x = scipy.sparse.lil_matrix( (N,N) )
for _ in xrange(N):
    x[random.randint(0,N-1),random.randint(0,N-1)]=random.randint(1,100)

yields these timeit results:

% python -mtimeit -s'import test' 'test.using_tocoo_izip(test.x)'
1000 loops, best of 3: 670 usec per loop
% python -mtimeit -s'import test' 'test.using_tocoo(test.x)'
1000 loops, best of 3: 706 usec per loop
% python -mtimeit -s'import test' 'test.using_coo(test.x)'
1000 loops, best of 3: 802 usec per loop
% python -mtimeit -s'import test' 'test.using_nonzero(test.x)'
100 loops, best of 3: 5.25 msec per loop

这篇关于遍历 scipy.sparse 向量(或矩阵)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆