numpy的花式索引如何工作? [英] How does numpy's fancy indexing work?

查看:173
本文介绍了numpy的花式索引如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对2D列表和numpy数组进行一些实验。由此,我提出了3个问题,我很想知道答案。



首先,我初始化了一个2D python列表。

 >>> my_list = [[1,2,3],[4,5,6],[7,8,9]] 

然后我尝试用元组索引列表。

 >>> my_list [:,] 
Traceback(最近一次调用最后一次):
文件< stdin>,第1行,< module>
TypeError:list indices必须是整数,而不是元组

因为解释器会抛出一个 TypeError 而不是 SyntaxError ,我猜测它实际上可以做到这一点,但是python本身并不支持它。 / p>

然后我尝试将列表转换为 numpy 数组并做同样的事情。

 >>> np.array(my_list)[:,] 
数组([[1,2,3],
[4,5,6],
[7,8,9]])

当然,没问题。我的理解是 __ xx __()方法中的一个已被覆盖并在 numpy 包中实现。



Numpy的索引也支持列表:

 >>> np.array(my_list)[:,[0,1]] 
array([[1,2],
[4,5],
[7,8]])

这提出了几个问题:


  1. 哪个 __ xx __ 方法有numpy覆盖/定义来处理花哨的索引?

  2. 为什么不用python列表原生支持花式索引?

此外,我运行此代码来比较python2与python3的拼接性能。

  import timeit 

print(timeit.timeit(list_1 [:] [:],
setup = list_1 = [[1,2,3],[4,5,6],[7,8,9]]))

print(timeit.timeit(list_2 [:, ],
setup =import numpy as np; list_2 = np.array([[1,2,3],[4,5,6],[7,8,9]])))

Python2(运行版本1.8.0rc1):

  0.352098941803 
1.24272298813

Python3(正在运行版本1.12.0):

  0.23113773498334 922 
0.20699498101021163

这带我到:


  1. 为什么numpy的花式索引在python2上如此之慢?是因为我在这个版本中没有 numpy 的原生BLAS支持吗?

如果我能澄清任何事情,请告诉我。谢谢。






编辑



在python2上配置numpy:

 >>> np.show_config()
...
blas_opt_info:
extra_link_args = ['-Wl,-framework',' - WI,Accelerate']
extra_compile_args = ['-msse3 ',' - I / BuildRoot / Applications / Xcode.app / Contents / Developer / Platforms / MacOSX.platform / Developer / SDKs / MacOSX10.12.Internal.sdk / System / Library / Frameworks / vecLib.framework / Headers']
define_macros = [('NO_ATLAS_INFO',3)]
...
lapack_opt_info:
extra_link_args = ['-Wl,-framework',' - Wl,Accelerate']
extra_compile_args = [' - msse3']
define_macros = [('NO_ATLAS_INFO',3)]
...

对于python3也是如此:

 >>> np.show_config()
....
blas_opt_info:
extra_compile_args = [' - msse3',' - I / System / Library / Frameworks / vecLib.framework / Headers']
extra_link_args = ['-Wl,-framework',' - Wl,Accelerate']
define_macros = [('NO_ATLAS_INFO',3),('HAVE_CBLAS',无)]
... 。
lapack_opt_info:
extra_compile_args = [' - msse3']
extra_link_args = ['-Wl,-framework',' - Wl,Accelerate']
define_macros = [(' NO_ATLAS_INFO',3),('HAVE_CBLAS',无)]


解决方案

您有三个问题:



1。哪个 __ xx __ 方法有numpy覆盖/定义来处理花式索引?



索引运算符 [] 可以使用 __ getitem __ __ setitem __ __ delitem__覆盖。编写一个提供一些内省的简单子类会很有趣:

 >>> class VerboseList(list):
... def __getitem __(self,key):
... print(key)
... return super().__ getitem __(key)
...

我们先做一个空的:

 >>> l = VerboseList()

现在用一些值填充它。请注意,我们还没有覆盖 __ setitem __ 所以没有任何有趣的事情发生:

 >>> l [:] =范围(10)

现在让我们得到一个项目。在索引 0 0

 >>> l [0] 
0
0

如果我们尝试使用元组,我们收到错误,但我们先看到元组!

 >>> l [0,4] 
(0,4)
回溯(最近一次调用最后一次):
文件< stdin>,第1行,< module>
文件< stdin>,第4行,在__getitem__
TypeError:列表索引必须是整数或切片,而不是元组

我们还可以了解python如何在内部表示切片:

 >> ;> l [1:3] 
slice(1,3,无)
[1,2]

使用此对象可以做更多有趣的事情 - 试一试!



2。为什么python列表本身不支持花式索引?



这很难回答。考虑它的一种方法是历史:因为 numpy 开发人员首先想到它。



你是年轻人。当我还是个孩子的时候......



1991年首次公开发布时,Python没有 numpy 库,要创建一个多维列表,您必须嵌套列表结构。我认为早期的开发人员 - 尤其是Guido van Rossum( GvR ) - 觉得最初保持简单是最好的。切片索引已经相当强大了。



然而,不久之后,人们对使用Python作为科学计算语言的兴趣越来越大。 1995年至1997年间,许多开发人员合作开发了一个名为 numeric 的库,这是 numpy 的早期前身。虽然他不是数字 numpy 的主要贡献者,但GvR与数字协调开发人员,扩展Python的切片语法,使多维数组索引更容易。之后,出现了 numeric 的替代方案,名为 numarray ;在2006年,创建了 numpy ,其中包含了两者的最佳功能。



这些库功能强大,但是需要重c扩展等。将它们加入基础Python发行版会使它变得笨重。虽然GvR确实增强了切片语法,但是为普通列表添加花哨的索引会大大改变它们的API - 并且有点多余。鉴于已经可以与外部图书馆进行花哨的索引,这样做的好处并不值得。



这些叙述的部分内容都是推测性的。 1 我真的不认识开发者!但这是我所做的同样的决定。事实上......



它应该是这样的。



虽然花哨的索引非常强大,但我很高兴它甚至不是今天的香草Python的一部分,因为这意味着你在使用普通列表时不必非常努力。对于许多任务,您不需要它,并且它所施加的认知负荷是重要的。



请记住,我在谈论对读者维护者施加的负担。你可能是一个能在你头脑中做五维张量产品的神奇天才,但其他人必须阅读你的代码。在 numpy 中保留花哨的索引意味着人们不会使用它,除非他们真的需要它,这使代码更易于阅读和维护。



3。为什么numpy的花式索引在python2上如此之慢?是因为我在这个版本中没有本地BLAS支持numpy吗?



可能。这绝对是环境依赖的;我在我的机器上看不到相同的区别。






1。叙述中不是推测的部分来自简历在科学与工程计算特刊(2011年第13卷)中讲述。


I was doing a little experimentation with 2D lists and numpy arrays. From this, I've raised 3 questions I'm quite curious to know the answer for.

First, I initialized a 2D python list.

>>> my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

I then tried indexing the list with a tuple.

>>> my_list[:,]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not tuple

Since the interpreter throws me a TypeError and not a SyntaxError, I surmised it is actually possible to do this, but python does not natively support it.

I then tried converting the list to a numpy array and doing the same thing.

>>> np.array(my_list)[:,]
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Of course, no problem. My understanding is that one of the __xx__() methods have been overridden and implemented in the numpy package.

Numpy's indexing supports lists too:

>>> np.array(my_list)[:,[0, 1]]
array([[1, 2],
       [4, 5],
       [7, 8]])

This has raised a couple of questions:

  1. Which __xx__ method has numpy overridden/defined to handle fancy indexing?
  2. Why don't python lists natively support fancy indexing?

Furthermore, I ran this code to compare splicing performance on python2 vs python3.

import timeit

print(timeit.timeit("list_1[:][:]", 
      setup="list_1 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]"))

print(timeit.timeit("list_2[:,]", 
      setup="import numpy as np; list_2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])"))

Python2 (running version 1.8.0rc1):

0.352098941803
1.24272298813

Python3 (running version 1.12.0):

0.23113773498334922
0.20699498101021163

This brings me to:

  1. Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?

Let me know if I can clarify anything. Thanks.


Edit

Config for numpy on python2:

>>> np.show_config()
...
blas_opt_info:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    extra_compile_args = ['-msse3', '-I/BuildRoot/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.Internal.sdk/System/Library/Frameworks/vecLib.framework/Headers']
    define_macros = [('NO_ATLAS_INFO', 3)]
...
lapack_opt_info:
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    extra_compile_args = ['-msse3']
    define_macros = [('NO_ATLAS_INFO', 3)]
...

And the same for python3:

>>> np.show_config()
....
blas_opt_info:
    extra_compile_args = ['-msse3', '-I/System/Library/Frameworks/vecLib.framework/Headers']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]
....
lapack_opt_info:
    extra_compile_args = ['-msse3']
    extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
    define_macros = [('NO_ATLAS_INFO', 3), ('HAVE_CBLAS', None)]

解决方案

You have three questions:

1. Which __xx__ method has numpy overridden/defined to handle fancy indexing?

The indexing operator [] is overridable using __getitem__, __setitem__, and __delitem__. It can be fun to write a simple subclass that offers some introspection:

>>> class VerboseList(list):
...     def __getitem__(self, key):
...         print(key)
...         return super().__getitem__(key)
...

Let's make an empty one first:

>>> l = VerboseList()

Now fill it with some values. Note that we haven't overridden __setitem__ so nothing interesting happens yet:

>>> l[:] = range(10)

Now let's get an item. At index 0 will be 0:

>>> l[0]
0
0

If we try to use a tuple, we get an error, but we get to see the tuple first!

>>> l[0, 4]
(0, 4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in __getitem__
TypeError: list indices must be integers or slices, not tuple

We can also find out how python represents slices internally:

>>> l[1:3]
slice(1, 3, None)
[1, 2]

There are lots more fun things you can do with this object -- give it a try!

2. Why don't python lists natively support fancy indexing?

This is hard to answer. One way of thinking about it is historical: because the numpy developers thought of it first.

You youngsters. When I was a kid...

Upon its first public release in 1991, Python had no numpy library, and to make a multi-dimensional list, you had to nest list structures. I assume that the early developers -- in particular, Guido van Rossum (GvR) -- felt that keeping things simple was best, initially. Slice indexing was already pretty powerful.

However, not too long after, interest grew in using Python as a scientific computing language. Between 1995 and 1997, a number of developers collaborated on a library called numeric, an early predecessor of numpy. Though he wasn't a major contributor to numeric or numpy, GvR coordinated with the numeric developers, extending Python's slice syntax in ways that made multidimensional array indexing easier. Later, an alternative to numeric arose called numarray; and in 2006, numpy was created, incorporating the best features of both.

These libraries were powerful, but they required heavy c extensions and so on. Working them into the base Python distribution would have made it bulky. And although GvR did enhance slice syntax a bit, adding fancy indexing to ordinary lists would have changed their API dramatically -- and somewhat redundantly. Given that fancy indexing could be had with an outside library already, the benefit wasn't worth the cost.

Parts of this narrative are speculative, in all honesty.1 I don't know the developers really! But it's the same decision I would have made. In fact...

It really should be that way.

Although fancy indexing is very powerful, I'm glad it's not part of vanilla Python even today, because it means that you don't have to think very hard when working with ordinary lists. For many tasks you don't need it, and the cognitive load it imposes is significant.

Keep in mind that I'm talking about the load imposed on readers and maintainers. You may be a whiz-bang genius who can do 5-d tensor products in your head, but other people have to read your code. Keeping fancy indexing in numpy means people don't use it unless they honestly need it, which makes code more readable and maintainable in general.

3. Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?

Possibly. It's definitely environment-dependent; I don't see the same difference on my machine.


1. The parts of the narrative that aren't as speculative are drawn from a brief history told in a special issue of Computing in Science and Engineering (2011 vol. 13).

这篇关于numpy的花式索引如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆