替换numpy数组中的元素,避免循环 [英] Replace elements in numpy array avoiding loops

查看:65
本文介绍了替换numpy数组中的元素,避免循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相当大的一维numpy数组Xold,具有给定的值.这些值应为根据2d numpy数组Y指定的规则替换:一个例子是

I have a quite large 1d numpy array Xold with given values. These values shall be replaced according to the rule specified by a 2d numpy array Y: An example would be

Xold=np.array([0,1,2,3,4])
Y=np.array([[0,0],[1,100],[3,300],[4,400],[2,200]])

每当Xold中的值与Y [:,0]中的值相同时,Xnew中的新值应为Y [:,1]中的对应值.这是通过两个嵌套的for循环完成的:

Whenever a value in Xold is identical to a value in Y[:,0], the new value in Xnew should be the corresponding value in Y[:,1]. This is accomplished by two nested for loops:

Xnew=np.zeros(len(Xold))
for i in range(len(Xold)):
for j in range(len(Y)):
    if Xold[i]==Y[j,0]:
        Xnew[i]=Y[j,1]

在给定的示例中,这将产生 Xnew = [0,100,200,300,400] .但是,对于大数据集,此过程非常慢.什么是更快,更优雅的方法来完成此任务的?

With the given example, this yields Xnew=[0,100,200,300,400]. However, for large data sets this procedure is quite slow. What is a faster and more elegant way to accomplish this task?

推荐答案

选择最快的方法

该问题的答案提供了各种各样的方法来替换numpy数组中的元素.让我们检查一下,哪一个最快.

Answers to this question provided a nice assortment of ways to replace elements in numpy array. Let's check, which one would be the quickest.

TL; DR: Numpy索引是赢家

TL;DR: Numpy indexing is the winner

 def meth1(): # suggested by @Slam
    for old, new in Y:  
        Xold[Xold == old] = new

 def meth2(): # suggested by myself, convert y_dict = dict(Y) first
     [y_dict[i] if i in y_dict.keys() else i for i in Xold]

 def meth3(): # suggested by @Eelco Hoogendoom, import numpy_index as npi first
     npi.remap(Xold, keys=Y[:, 0], values=Y[:, 1])

 def meth4(): # suggested by @Brad Solomon, import pandas as pd first 
     pd.Series(Xold).map(pd.Series(Y[:, 1], index=Y[:, 0])).values

  # suggested by @jdehesa. create Xnew = Xold.copy() and index
  # idx = np.searchsorted(Xold, Y[:, 0]) first
  def meth5():             
     Xnew[idx] = Y[:, 1]

结果并不令人惊讶

 In [39]: timeit.timeit(meth1, number=1000000)                                                                      
 Out[39]: 12.08

 In [40]: timeit.timeit(meth2, number=1000000)                                                                      
 Out[40]: 2.87

 In [38]: timeit.timeit(meth3, number=1000000)                                                                      
 Out[38]: 55.39

 In [12]: timeit.timeit(meth4, number=1000000)                                                                                      
 Out[12]: 256.84

 In [50]: timeit.timeit(meth5, number=1000000)                                                                                      
 Out[50]: 1.12

因此,良好的旧列表理解速度是第二快的,而成功的方法是将numpy索引与 searchsorted()结合使用.

So, the good old list comprehension is the second fastest, and the winning approach is numpy indexing combined with searchsorted().

这篇关于替换numpy数组中的元素,避免循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆