与使用"for循环"相比,我该如何改变思维以“向量化我的计算"? [英] How do I shift my thinking to 'vectorize my computation' more than using 'for-loops'?
问题描述
这绝对是一个概念性问题,但是我想在SO上获得其他有关此主题的专业知识投入.我的大部分编程工作最近来自 Numpy 数组.我一直在匹配大小不同的两个左右数组中的项目.大多数时候,我会去一个for循环,甚至更糟的是嵌套的for循环.我最终尝试避免使用for循环,因为我尝试在Data Science中获得更多经验,因为for循环的执行速度较慢.
This is definitely more of a notional question, but I wanted to get others expertise input on this topic at SO. Most of my programming is coming from Numpy arrays lately. I've been matching items in two or so arrays that are different in sizes. Most of the time I will go to a for-loop or even worst, nested for-loop. I'm ultimately trying to avoid using for-loops as I try to gain more experience in Data Science because for-loops perform slower.
我非常了解Numpy和我可以研究的预定义cmd,但是对于那些有经验的人,当您遍历某些东西时,您是否具有一般的思想?
I am well aware of Numpy and the pre-defined cmds I can research, but for those of you whom are experienced, do you have a general school of thought when you iterate through something?
类似于以下内容:
small_array = np.array(["a", "b"])
big_array = np.array(["a", "b", "c", "d"])
for i in range(len(small_array)):
for p in range(len(big_array)):
if small_array[i] == big_array[p]:
print "This item is matched: ", small_array[i]
我很清楚有一种方法可以用这种方法给猫咪剥皮,但是我对其他方法和思维方式很感兴趣.
I'm well aware there are more than one way to skin a cat with this, but I am interested in others approach and way of thinking.
推荐答案
由于我使用数组语言已有数十年的历史(APL,MATLAB,numpy),所以我对启动步骤一无所知.但是我怀疑我主要是根据模式,过去见过和使用过的东西工作.而且,我在互动会话中做了很多实验.
Since I've been working with array languages for decades (APL, MATLAB, numpy) I can't help with the starting steps. But I suspect I work mostly from patterns, things I've seen and used in the past. And I do a lot to experimentation in an interactive session.
以您的示例为例:
In [273]: small_array = np.array(["a", "b"])
...: big_array = np.array(["a", "b", "c", "d"])
...:
...: for i in range(len(small_array)):
...: for p in range(len(big_array)):
...: if small_array[i] == big_array[p]:
...: print( "This item is matched: ", small_array[i])
...:
This item is matched: a
This item is matched: b
我经常运行迭代案例只是为了清楚地了解所需的内容.
Often I run the iterative case just to get a clear(er) idea of what is desired.
In [274]: small_array
Out[274]:
array(['a', 'b'],
dtype='<U1')
In [275]: big_array
Out[275]:
array(['a', 'b', 'c', 'd'],
dtype='<U1')
我之前已经看过-遍历两个数组,并使用配对的值进行处理.这是一种outer
操作.有各种各样的工具,但是我最喜欢的工具是利用numpy
广播.它将一个数组转换为(n,1)数组,然后将其与另一个(m,)数组一起使用
I've seen this before - iterating over two arrays, and doing something with the paired values. This is a kind of outer
operation. There are various tools, but the one I like best makes use of numpy
broadcasting. It turn one array into a (n,1) array, and use it with the other (m,) array
In [276]: small_array[:,None]
Out[276]:
array([['a'],
['b']],
dtype='<U1')
将(n,1)与(1,m)进行运算的结果是一个(n,m)数组:
The result of (n,1) operating with (1,m) is a (n,m) array:
In [277]: small_array[:,None]==big_array
Out[277]:
array([[ True, False, False, False],
[False, True, False, False]], dtype=bool)
现在我可以在任一轴上进行any
或all
缩小:
Now I can take an any
or all
reduction on either axis:
In [278]: _.all(axis=0)
Out[278]: array([False, False, False, False], dtype=bool)
In [280]: __.all(axis=1)
Out[280]: array([False, False], dtype=bool)
我还可以使用np.where
将该布尔值减少为索引.
I could also use np.where
to reduce that boolean to indices.
糟糕,我应该使用any
In [284]: (small_array[:,None]==big_array).any(0)
Out[284]: array([ True, True, False, False], dtype=bool)
In [285]: (small_array[:,None]==big_array).any(1)
Out[285]: array([ True, True], dtype=bool)
玩过这个游戏我记得有一个in1d
做类似的事情
Having played with this I remember that there's a in1d
that does something similar
In [286]: np.in1d(big_array, small_array)
Out[286]: array([ True, True, False, False], dtype=bool)
但是当我查看in1d
的代码(请参阅文档中的[source]
链接)时,我看到,在某些情况下,它实际上是在小型数组上进行迭代的:
But when I look at the code for in1d
(see the [source]
link in the docs), I see that, in some cases it actually iterates on the small array:
In [288]: for x in small_array:
...: print(x==big_array)
...:
[ True False False False]
[False True False False]
将其与Out[277]
进行比较. x==big_array
将标量与数组进行比较.在numpy
中,使用数组和标量执行类似==
,+
,*
等的操作很容易,并且应该成为第二自然.下一步是对2个匹配形状的数组进行相同的操作.然后从那里开始播放可广播的形状.
Compare that to Out[277]
. x==big_array
compares a scalar with an array. In numpy
, doing something like ==
, +
, *
etc with an array and scalar is easy, and should become second nature. Doing the same thing with 2 arrays of matching shapes is the next step. And from there do it with broadcastable shapes.
在其他情况下,它使用np.unique
和np.argsort
.
In other cases it use np.unique
and np.argsort
.
这种通过相互广播输入来创建高维数组,然后将值与某种归约方式(任意,全部,总和,均值等)组合在一起的模式非常普遍.
This pattern of creating a higher dimension array by broadcasting the inputs against each other, and then combining values with some sort of reduction (any, all, sum, mean, etc) is very common.
这篇关于与使用"for循环"相比,我该如何改变思维以“向量化我的计算"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!