与使用"for循环"相比,我该如何改变思维以“向量化我的计算"? [英] How do I shift my thinking to 'vectorize my computation' more than using 'for-loops'?

查看:103
本文介绍了与使用"for循环"相比,我该如何改变思维以“向量化我的计算"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这绝对是一个概念性问题,但是我想在SO上获得其他有关此主题的专业知识投入.我的大部分编程工作最近来自 Numpy 数组.我一直在匹配大小不同的两个左右数组中的项目.大多数时候,我会去一个for循环,甚至更糟的是嵌套的for循环.我最终尝试避免使用for循环,因为我尝试在Data Science中获得更多经验,因为for循环的执行速度较慢.

This is definitely more of a notional question, but I wanted to get others expertise input on this topic at SO. Most of my programming is coming from Numpy arrays lately. I've been matching items in two or so arrays that are different in sizes. Most of the time I will go to a for-loop or even worst, nested for-loop. I'm ultimately trying to avoid using for-loops as I try to gain more experience in Data Science because for-loops perform slower.

我非常了解Numpy和我可以研究的预定义cmd,但是对于那些有经验的人,当您遍历某些东西时,您是否具有一般的思想?

I am well aware of Numpy and the pre-defined cmds I can research, but for those of you whom are experienced, do you have a general school of thought when you iterate through something?

类似于以下内容:

small_array = np.array(["a", "b"])
big_array = np.array(["a", "b", "c", "d"])

for i in range(len(small_array)):
    for p in range(len(big_array)):
        if small_array[i] == big_array[p]:
            print "This item is matched: ", small_array[i]

我很清楚有一种方法可以用这种方法给猫咪剥皮,但是我对其他方法和思维方式很感兴趣.

I'm well aware there are more than one way to skin a cat with this, but I am interested in others approach and way of thinking.

推荐答案

由于我使用数组语言已有数十年的历史(APL,MATLAB,numpy),所以我对启动步骤一无所知.但是我怀疑我主要是根据模式,过去见过和使用过的东西工作.而且,我在互动会话中做了很多实验.

Since I've been working with array languages for decades (APL, MATLAB, numpy) I can't help with the starting steps. But I suspect I work mostly from patterns, things I've seen and used in the past. And I do a lot to experimentation in an interactive session.

以您的示例为例:

In [273]: small_array = np.array(["a", "b"])
     ...: big_array = np.array(["a", "b", "c", "d"])
     ...: 
     ...: for i in range(len(small_array)):
     ...:     for p in range(len(big_array)):
     ...:         if small_array[i] == big_array[p]:
     ...:             print( "This item is matched: ", small_array[i])
     ...:             
This item is matched:  a
This item is matched:  b

我经常运行迭代案例只是为了清楚地了解所需的内容.

Often I run the iterative case just to get a clear(er) idea of what is desired.

In [274]: small_array
Out[274]: 
array(['a', 'b'],
      dtype='<U1')
In [275]: big_array
Out[275]: 
array(['a', 'b', 'c', 'd'],
      dtype='<U1')

我之前已经看过-遍历两个数组,并使用配对的值进行处理.这是一种outer操作.有各种各样的工具,但是我最喜欢的工具是利用numpy广播.它将一个数组转换为(n,1)数组,然后将其与另一个(m,)数组一起使用

I've seen this before - iterating over two arrays, and doing something with the paired values. This is a kind of outer operation. There are various tools, but the one I like best makes use of numpy broadcasting. It turn one array into a (n,1) array, and use it with the other (m,) array

In [276]: small_array[:,None]
Out[276]: 
array([['a'],
       ['b']],
      dtype='<U1')

将(n,1)与(1,m)进行运算的结果是一个(n,m)数组:

The result of (n,1) operating with (1,m) is a (n,m) array:

In [277]: small_array[:,None]==big_array
Out[277]: 
array([[ True, False, False, False],
       [False,  True, False, False]], dtype=bool)

现在我可以在任一轴上进行anyall缩小:

Now I can take an any or all reduction on either axis:

In [278]: _.all(axis=0)
Out[278]: array([False, False, False, False], dtype=bool)

In [280]: __.all(axis=1)
Out[280]: array([False, False], dtype=bool)

我还可以使用np.where将该布尔值减少为索引.

I could also use np.where to reduce that boolean to indices.

糟糕,我应该使用any

In [284]: (small_array[:,None]==big_array).any(0)
Out[284]: array([ True,  True, False, False], dtype=bool)
In [285]: (small_array[:,None]==big_array).any(1)
Out[285]: array([ True,  True], dtype=bool)

玩过这个游戏我记得有一个in1d做类似的事情

Having played with this I remember that there's a in1d that does something similar

In [286]: np.in1d(big_array, small_array)
Out[286]: array([ True,  True, False, False], dtype=bool)

但是当我查看in1d的代码(请参阅文档中的[source]链接)时,我看到,在某些情况下,它实际上是在小型数组上进行迭代的:

But when I look at the code for in1d (see the [source] link in the docs), I see that, in some cases it actually iterates on the small array:

In [288]: for x in small_array:
     ...:     print(x==big_array)
     ...:     
[ True False False False]
[False  True False False]

将其与Out[277]进行比较. x==big_array将标量与数组进行比较.在numpy中,使用数组和标量执行类似==+*等的操作很容易,并且应该成为第二自然.下一步是对2个匹配形状的数组进行相同的操作.然后从那里开始播放可广播的形状.

Compare that to Out[277]. x==big_array compares a scalar with an array. In numpy, doing something like ==, +, * etc with an array and scalar is easy, and should become second nature. Doing the same thing with 2 arrays of matching shapes is the next step. And from there do it with broadcastable shapes.

在其他情况下,它使用np.uniquenp.argsort.

In other cases it use np.unique and np.argsort.

这种通过相互广播输入来创建高维数组,然后将值与某种归约方式(任意,全部,总和,均值等)组合在一起的模式非常普遍.

This pattern of creating a higher dimension array by broadcasting the inputs against each other, and then combining values with some sort of reduction (any, all, sum, mean, etc) is very common.

这篇关于与使用"for循环"相比,我该如何改变思维以“向量化我的计算"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆