指数在numpy的多维数组订单 [英] Order of indexes in a Numpy multidimensional array

查看:115
本文介绍了指数在numpy的多维数组订单的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

举例来说,说我模拟了一堆做的事情随着时间的推移颗粒,和我有一个多维数组名为颗粒这些指数:

For example, say I'm simulating a bunch of particles doing something over time, and I have a multidimensional array called particles with these indexes:


  • 粒子对3D的X / Y / Z坐标(长度 A ,这是 3 的空间)

  • 的单个粒子(长度 B
  • 的索引
  • 时间步长这是对的指数(长度 C

  • The x/y/z coordinates of the particle (of length a, which is 3 for a 3d space)
  • The index of the individual particle (of length b)
  • The index of the time step it's on (of length c)

它是更好地构建阵列,使得 particles.shape ==(A,B,C) particles.shape ==( C,b,A)

Is it better to construct the array such that particles.shape == (a, b, c) or particles.shape == (c, b, a)?

我更感兴趣的约定不是效率:numpy的数组可以用C风格的设置(上次指数变化最迅速的)或Fortran语言风格(第一个索引),因此它可以有效地支持任何设置。我也意识到我可以使用把指标在我需要的任何命令,但我想,以尽量减少。

I'm more interested in convention than efficiency: Numpy arrays can be set up in either C-style (last index varies most rapidly) or Fortran-style (first index), so it can efficiently support either setup. I also realize I can use transpose to put the indexes in any order I need, but I'd like to minimize that.

我开始这项研究自己,找到了两种方式支持:

I started to research this myself and found support for both ways:

亲(C,B,A):


  • 默认情况下,numpy的采用C风格的数组,其中最后一个索引是最快的变化。

  • 大多数向量代数函数(等)上的最后一个索引行为。 (作用于最后一个,第二个到最后其他的。)

  • matplotlib 集合对象( LineCollection PolyCollection )期待与过去轴的空间坐标的阵列。

  • By default, Numpy uses C-style arrays where the last index is the fastest-varying.
  • Most of the vector algebra functions (inner, cross, etc.) act on the last index. (dot acts on the last of one and the second-to-last of the other.)
  • The matplotlib collection objects (LineCollection, PolyCollection) expect arrays with the spatial coordinates in the last axis.

亲(A,B,C):


  • 如果我是使用 meshgrid MGRID 来产生一组点的,那就把空间轴第一。例如, np.mgrid [0:5,0:5,0:5] .shape ==(3,5,5,5)。我实现这些功能大多用于整数数组索引,但它并不少见,用它们来生成点的网格。

  • matplotlib 分散剧情职能剥离出来他们的论据,所以它的不可知到阵列的形状,但 ax.plot3d(颗粒[0],粒子[1],粒子[2])较短键入比颗粒[...,0]
  • 版本
  • If i were to use meshgrid and mgrid to produce a set of points, it would put the spatial axis first. For instance, np.mgrid[0:5,0:5,0:5].shape == (3,5,5,5). I realize these functions are mostly intended for integer array indexing, but it's not uncommon to use them to generate a grid of points.
  • The matplotlib scatter and plot functions split out their arguments, so it's agnostic to the shape of the array, but ax.plot3d(particles[0], particles[1], particles[2]) is shorter to type than the version with particles[..., 0]

在总体看来,目前存在两种不同的约定(可能是由于C和Fortran之间的历史差异),而现在还不清楚这是在numpy的社会比较常见的,或者更适合我在做什么。

In general it appears that there are two different conventions in existence (probably due to historical differences between C and Fortran), and it's not clear which is more common in the Numpy community, or more appropriate for what I'm doing.

推荐答案

有关公约的东西像这样有很多事情要做,特别文件的格式比什么都重要,在我的经验。然而,有一个快速的方法来回答哪一个可能是最适合你在做什么:

Conventions for something like this have much more to do with particular file-formats than anything else, in my experience. However, there's a quick way to answer which one is likely to be best for what you're doing:

如果您要遍历一个轴,其中一个最有可能遍历换句话说,这其中是最有可能的:

If you have to iterate over an axis, which one are you most likely to iterate over? In other words, which of these is most likely:

# a first
for dimension in particles:
    ...

# b first
for particle in particles:
    ...

# c first
for timestep in particles:
    ...

至于效率会,这是假定C-秩序,但实际上这里无关紧要。在蟒蛇的水平,而不管内存布局的C-有序进行处理接入numpy的数组。 (你总是遍历第一轴,即使这不是在内存中最邻近的轴)。

As far as efficiency goes, this assumes C-order, but that's actually irrelevant here. At the python level, access to numpy arrays is treated as C-ordered regardless of the memory layout. (You always iterate over the first axis, even if that's not the "most contiguous" axis in memory.)

当然,也有很多情况下,你应该避免直接遍历在这个问题上numpy的阵列。然而,这是你应该考虑的方式,特别是当它涉及到磁盘上的文件结构。让你的最常见的情况最快/最简单的。

Of course, there are many situations where you should avoid directly iterating over numpy arrays in this matter. Nonetheless, this is the way you should think about it, particularly when it comes to on-disk file structures. Make your most common use case the fastest/easiest.

如果不出意外,希望这给你思考问题的有效途径。

If nothing else, hopefully this gives you a useful way to think about the question.

这篇关于指数在numpy的多维数组订单的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆