ndarray 比 recarray 访问更快吗? [英] is ndarray faster than recarray access?

查看:35
本文介绍了ndarray 比 recarray 访问更快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能够将我的 recarray 数据复制到 ndarray,进行一些计算并返回具有更新值的 ndarray.

I was able to copy my recarray data to a ndarray, do some calculations and return the ndarray with updated values.

然后,我发现了 numpy.lib.recfunctions 中的 append_fields() 功能,并认为简单地将 2 个字段附加到我原来的 recarray 会更聪明保存我的计算值.

Then, I discovered the append_fields() capability in numpy.lib.recfunctions, and thought it would be a lot smarter to simply append 2 fields to my original recarray to hold my calculated values.

当我这样做时,我发现操作要慢得多.我不必计时,基于 ndarray 的过程需要几秒钟,而 recarray 需要一分钟以上,而且我的测试数组很小,<10,000 行.

When I did this, I found the operation was much, much slower. I didn't have to time it, the ndarray based process takes a few seconds compared to a minute+ with recarray and my test arrays are small, <10,000 rows.

这是典型的吗?ndarray 访问比 recarray 快得多?由于按字段名称访问,我预计性能会有所下降,但不会如此严重.

Is this typical? ndarray access is much faster than recarray? I expected some performance degradation due to access by field name, but not this much.

推荐答案

2018 年 11 月 15 日更新
我扩展了我的计时测试,以阐明 ndarray、结构化数组、recarray 和掩码数组(记录数组的类型?)的性能差异.每一个都有细微的差别.请参阅此处的讨论:
numpy-discussion:structured-arrays-recarrays-and-record-arrays

这是我的性能测试结果.我构建了一个非常简单的示例(使用我的 HDF5 数据集之一)来比较存储在 4 种类型数组中的相同数据的性能:ndarray、结构化数组、recarray 和掩码数组.数组构建完成后,它们被传递给一个函数,该函数简单地遍历每一行并从每一行中提取 12 个值.这些函数是从 timeit 函数通过单次传递(编号 = 1)调用的.此测试仅测量数组读取功能,并避免所有其他计算.
下面给出了 9,000 行的结果:

Here are result of my performance tests. I built a very simple example (using 1 of my HDF5 data sets) to compare performance with the same data stored in 4 types of arrays: ndarray, structured array, recarray and masked array. After the arrays are constructed, they are passed to a function that simply loops thru each row and extracts 12 values from each row. The functions are called from the timeit function with a single pass (number=1). This test only measures the array read function, and avoids all other calculations.
Results given below for 9,000 rows:

for ndarray: 0.034137165047070615
for structured array: 0.1306827116913577
for recarray: 0.446010040784266
for masked array: 31.33269560998199

基于此测试,每种类型的访问性能均有所下降.结构化数组和 recarray 的访问时间比 ndarray 访问慢 4 到 13 倍(但都只有几分之一秒).但是,ndarray 访问比屏蔽数组访问快 1000 倍.这解释了我在完整示例中看到的秒到分钟的差异.希望这些数据对遇到此问题的其他人有用.

Based on this test, access performance decreases with each type. Access times for structured array and recarray are 4x-13x slower than ndarray access (but all are only a fraction of second). However, ndarray access is 1000x faster than masked array access. That explains the seconds to minutes difference I see in my complete example. Hopefully this data is useful to others that encounter this issue.

这篇关于ndarray 比 recarray 访问更快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆