将Numpy数组转换为稀疏字典的最快方法? [英] Fastest way to convert a Numpy array into a sparse dictionary?

查看:229
本文介绍了将Numpy数组转换为稀疏字典的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有兴趣尽快将numpy数组转换为稀疏字典.让我详细说明:

I'm interested in converting a numpy array into a sparse dictionary as quickly as possible. Let me elaborate:

给出数组:

numpy.array([12,0,0,0,3,0,0,1])

我想制作字典:

{0:12, 4:3, 7:1}

如您所见,我们只是将序列类型转换为从非零索引到其值的显式映射.

As you can see, we are simply converting the sequence type into an explicit mapping from indices that are nonzero to their values.

为了使这一点更有趣,我提供了以下测试工具来尝试替代方法:

In order to make this a bit more interesting, I offer the following test harness to try out alternatives:

from timeit import Timer

if __name__ == "__main__":
  s = "import numpy; from itertools import izip; from numpy import nonzero, flatnonzero; vector =         numpy.random.poisson(0.1, size=10000);"

  ms = [ "f = flatnonzero(vector); dict( zip( f, vector[f] ) )"
             , "f = flatnonzero(vector); dict( izip( f, vector[f] ) )"
             , "f = nonzero(vector); dict( izip( f[0], vector[f] ) )"
             , "n = vector > 0; i = numpy.arange(len(vector))[n]; v = vector[n]; dict(izip(i,v))"
             , "i = flatnonzero(vector); v = vector[vector > 0]; dict(izip(i,v))"
             , "dict( zip( flatnonzero(vector), vector[flatnonzero(vector)] ) )"
             , "dict( zip( flatnonzero(vector), vector[nonzero(vector)] ) )"
             , "dict( (i, x) for i,x in enumerate(vector) if x > 0);"
             ]
  for m in ms:
    print "  %.2fs" % Timer(m, s).timeit(1000), m

我正在使用泊松分布来模拟我感兴趣的转换数组.

I'm using a poisson distribution to simulate the sort of arrays I am interested in converting.

这是我到目前为止的结果:

Here are my results so far:

   0.78s f = flatnonzero(vector); dict( zip( f, vector[f] ) )
   0.73s f = flatnonzero(vector); dict( izip( f, vector[f] ) )
   0.71s f = nonzero(vector); dict( izip( f[0], vector[f] ) )
   0.67s n = vector > 0; i = numpy.arange(len(vector))[n]; v = vector[n]; dict(izip(i,v))
   0.81s i = flatnonzero(vector); v = vector[vector > 0]; dict(izip(i,v))
   1.01s dict( zip( flatnonzero(vector), vector[flatnonzero(vector)] ) )
   1.03s dict( zip( flatnonzero(vector), vector[nonzero(vector)] ) )
   4.90s dict( (i, x) for i,x in enumerate(vector) if x > 0);

如您所见,我找到的最快的解决方案是

As you can see, the fastest solution I have found is

n = vector > 0;
i = numpy.arange(len(vector))[n]
v = vector[n]
dict(izip(i,v))

有更快的方法吗?

步骤

i = numpy.arange(len(vector))[n]

似乎特别笨拙-在仅选择某些元素之前生成整个数组,尤其是当我们知道它可能仅是被选择的元素的1/10左右时.我认为这可能仍会得到改善.

Seems particularly clumsy- generating an entire array before selecting only some elements, particularly when we know it might only be around 1/10 of the elements getting selected. I think this might still be improved.

推荐答案

>>> a=np.array([12,0,0,0,3,0,0,1])
>>> {i:a[i] for i in np.nonzero(a)[0]}
{0: 12, 4: 3, 7: 1}

这篇关于将Numpy数组转换为稀疏字典的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆