numpy 的花哨索引是如何实现的? [英] How is numpy's fancy indexing implemented?

查看:38
本文介绍了numpy 的花哨索引是如何实现的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对 2D 列表和 numpy 数组进行一些实验.由此,我提出了 3 个我很想知道答案的问题.

首先,我初始化了一个 2D python 列表.

<预><代码>>>>my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

然后我尝试用元组索引列表.

<预><代码>>>>我的列表[:,]回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中类型错误:列表索引必须是整数,而不是元组

由于解释器向我抛出了一个 TypeError 而不是 SyntaxError,我推测这实际上是可能的,但是 python 本身并不支持它.

然后我尝试将列表转换为 numpy 数组并执行相同的操作.

<预><代码>>>>np.array(my_list)[:,]数组([[1, 2, 3],[4, 5, 6],[7, 8, 9]])

当然没问题.我的理解是 __xx__() 方法之一已被覆盖并在 numpy 包中实现.

Numpy 的索引也支持列表:

<预><代码>>>>np.array(my_list)[:,[0, 1]]数组([[1, 2],[4, 5],[7, 8]])

这引发了几个问题:

  1. 哪个 __xx__ 方法覆盖/定义了 numpy 来处理花哨的索引?
  2. 为什么 Python 列表本身不支持花式索引?

(额外问题:为什么我的时间显示在 python2 中切片比 python3 慢?)

解决方案

您有三个问题:

1.哪个 __xx__ 方法覆盖/定义了 numpy 来处理花哨的索引?

索引运算符 [] 可以使用 __getitem____setitem____delitem__ 覆盖.编写一个提供一些内省的简单子类会很有趣:

<预><代码>>>>类详细列表(列表):... def __getitem__(self, key):...打印(键)... return super().__getitem__(key)...

让我们先创建一个空的:

<预><代码>>>>l = VerboseList()

现在用一些值填充它.请注意,我们还没有覆盖 __setitem__ 所以还没有发生任何有趣的事情:

<预><代码>>>>l[:] = 范围(10)

现在让我们得到一个项目.在索引 0 处将是 0:

<预><代码>>>>升[0]00

如果我们尝试使用元组,我们会得到一个错误,但我们可以先看到元组!

<预><代码>>>>l[0, 4](0, 4)回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中文件",第 4 行,在 __getitem__ 中类型错误:列表索引必须是整数或切片,而不是元组

我们还可以找出python内部是如何表示切片的:

<预><代码>>>>l[1:3]切片(1、3、无)[1, 2]

你可以用这个对象做更多有趣的事情——试试看!

2.为什么python列表本身不支持花式索引?

这很难回答.一种思考方式是历史性的:因为 numpy 开发人员首先想到了它.

你们这些年轻人.当我还是个孩子的时候...

在 1991 年首次公开发布时,Python 没有 numpy 库,要制作多维列表,您必须嵌套列表结构.我认为早期的开发者——尤其是 Guido van Rossum (GvR)——认为保持简单是最好的,最初.切片索引已经非常强大了.

然而,不久之后,人们对使用 Python 作为一种科学计算语言的兴趣增加了.1995 年到 1997 年间,许多开发人员合作开发了一个名为 numeric 的库,这是 numpy 的早期前身.尽管他不是 numericnumpy 的主要贡献者,但 GvR 与 numeric 开发人员协调,以实现多维的方式扩展 Python 的切片语法数组索引更容易.后来,出现了 numeric 的替代方案,称为 numarray;2006 年,numpy 诞生,结合了两者的最佳特性.

这些库很强大,但它们需要大量的 c 扩展等等.将它们加入到基本的 Python 发行版中会使它变得笨重.尽管 GvR 确实增强了切片语法,但向普通列表添加花哨的索引会显着改变它们的 API —— 并且有些冗余.鉴于外部图书馆已经可以进行花哨的索引,因此付出的代价是不值得的.

老实说,此叙述的部分内容是推测性的.1我真的不了解开发人员!但这与我会做出的决定相同.其实...

确实应该这样.

虽然花哨的索引非常强大,但我很高兴即使在今天它也不是普通 Python 的一部分,因为这意味着在处理普通列表时您不必费力思考.对于许多您不需要它的任务,它带来的认知负担很重要.

请记住,我说的是读者维护者的负担.你可能是个天才,可以在你的头脑中做 5 维张量积,但其他人必须阅读你的代码.在 numpy 中保持花哨的索引意味着人们不会使用它,除非他们真的需要它,这使得代码总体上更具可读性和可维护性.

3.为什么 numpy 花哨的索引在 python2 上如此缓慢?是不是因为我在这个版本中没有对 numpy 的原生 BLAS 支持?

可能.它绝对依赖于环境;我在我的机器上没有看到相同的差异.

<小时>

1.叙述中不具有推测性的部分来自 简史在《科学与工程计算》特刊(2011 年第 13 卷)中讲述.

I was doing a little experimentation with 2D lists and numpy arrays. From this, I've raised 3 questions I'm quite curious to know the answer for.

First, I initialized a 2D python list.

>>> my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

I then tried indexing the list with a tuple.

>>> my_list[:,]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not tuple

Since the interpreter throws me a TypeError and not a SyntaxError, I surmised it is actually possible to do this, but python does not natively support it.

I then tried converting the list to a numpy array and doing the same thing.

>>> np.array(my_list)[:,]
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Of course, no problem. My understanding is that one of the __xx__() methods have been overridden and implemented in the numpy package.

Numpy's indexing supports lists too:

>>> np.array(my_list)[:,[0, 1]]
array([[1, 2],
       [4, 5],
       [7, 8]])

This has raised a couple of questions:

  1. Which __xx__ method has numpy overridden/defined to handle fancy indexing?
  2. Why don't python lists natively support fancy indexing?

(Bonus question: why do my timings show that slicing in python2 is slower than python3?)

解决方案

You have three questions:

1. Which __xx__ method has numpy overridden/defined to handle fancy indexing?

The indexing operator [] is overridable using __getitem__, __setitem__, and __delitem__. It can be fun to write a simple subclass that offers some introspection:

>>> class VerboseList(list):
...     def __getitem__(self, key):
...         print(key)
...         return super().__getitem__(key)
...

Let's make an empty one first:

>>> l = VerboseList()

Now fill it with some values. Note that we haven't overridden __setitem__ so nothing interesting happens yet:

>>> l[:] = range(10)

Now let's get an item. At index 0 will be 0:

>>> l[0]
0
0

If we try to use a tuple, we get an error, but we get to see the tuple first!

>>> l[0, 4]
(0, 4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in __getitem__
TypeError: list indices must be integers or slices, not tuple

We can also find out how python represents slices internally:

>>> l[1:3]
slice(1, 3, None)
[1, 2]

There are lots more fun things you can do with this object -- give it a try!

2. Why don't python lists natively support fancy indexing?

This is hard to answer. One way of thinking about it is historical: because the numpy developers thought of it first.

You youngsters. When I was a kid...

Upon its first public release in 1991, Python had no numpy library, and to make a multi-dimensional list, you had to nest list structures. I assume that the early developers -- in particular, Guido van Rossum (GvR) -- felt that keeping things simple was best, initially. Slice indexing was already pretty powerful.

However, not too long after, interest grew in using Python as a scientific computing language. Between 1995 and 1997, a number of developers collaborated on a library called numeric, an early predecessor of numpy. Though he wasn't a major contributor to numeric or numpy, GvR coordinated with the numeric developers, extending Python's slice syntax in ways that made multidimensional array indexing easier. Later, an alternative to numeric arose called numarray; and in 2006, numpy was created, incorporating the best features of both.

These libraries were powerful, but they required heavy c extensions and so on. Working them into the base Python distribution would have made it bulky. And although GvR did enhance slice syntax a bit, adding fancy indexing to ordinary lists would have changed their API dramatically -- and somewhat redundantly. Given that fancy indexing could be had with an outside library already, the benefit wasn't worth the cost.

Parts of this narrative are speculative, in all honesty.1 I don't know the developers really! But it's the same decision I would have made. In fact...

It really should be that way.

Although fancy indexing is very powerful, I'm glad it's not part of vanilla Python even today, because it means that you don't have to think very hard when working with ordinary lists. For many tasks you don't need it, and the cognitive load it imposes is significant.

Keep in mind that I'm talking about the load imposed on readers and maintainers. You may be a whiz-bang genius who can do 5-d tensor products in your head, but other people have to read your code. Keeping fancy indexing in numpy means people don't use it unless they honestly need it, which makes code more readable and maintainable in general.

3. Why is numpy's fancy indexing so slow on python2? Is it because I don't have native BLAS support for numpy in this version?

Possibly. It's definitely environment-dependent; I don't see the same difference on my machine.


1. The parts of the narrative that aren't as speculative are drawn from a brief history told in a special issue of Computing in Science and Engineering (2011 vol. 13).

这篇关于numpy 的花哨索引是如何实现的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆