Array vs Slice:访问速度 [英] Array vs Slice: accessing speed

查看:23
本文介绍了Array vs Slice:访问速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是关于访问数组和切片元素的速度,而不是关于将它们作为参数传递给函数的效率.

This question is about the speed of accessing elements of arrays and slices, not about the efficiency of passing them to functions as arguments.

我希望数组在大多数情况下比切片更快,因为切片是一种描述数组连续部分的数据结构,因此可能会有额外的访问切片的元素(间接访问其底层数组的元素)时涉及的步骤.

I would expect arrays to be faster than slices in most cases because a slice is a data structure describing a contiguous section of an array and so there may be an extra step involved when accessing elements of a slice (indirectly the elements of its underlying array).

所以我写了一个小测试来对一批简单的操作进行基准测试.有 4 个基准函数,前 2 个测试 全局 切片和全局数组,其他 2 个测试局部 切片和局部数组:

So I wrote a little test to benchmark a batch of simple operations. There are 4 benchmark functions, the first 2 test a global slice and a global array, the other 2 test a local slice and a local array:

var gs = make([]byte, 1000) // Global slice
var ga [1000]byte           // Global array

func BenchmarkSliceGlobal(b *testing.B) {
    for i := 0; i < b.N; i++ {
        for j, v := range gs {
            gs[j]++; gs[j] = gs[j] + v + 10; gs[j] += v
        }
    }
}

func BenchmarkArrayGlobal(b *testing.B) {
    for i := 0; i < b.N; i++ {
        for j, v := range ga {
            ga[j]++; ga[j] = ga[j] + v + 10; ga[j] += v
        }
    }
}

func BenchmarkSliceLocal(b *testing.B) {
    var s = make([]byte, 1000)
    for i := 0; i < b.N; i++ {
        for j, v := range s {
            s[j]++; s[j] = s[j] + v + 10; s[j] += v
        }
    }
}

func BenchmarkArrayLocal(b *testing.B) {
    var a [1000]byte
    for i := 0; i < b.N; i++ {
        for j, v := range a {
            a[j]++; a[j] = a[j] + v + 10; a[j] += v
        }
    }
}

我多次运行测试,这是典型的输出(go test -bench .*):

I ran the test multiple times, here is the typical output (go test -bench .*):

BenchmarkSliceGlobal      300000              4210 ns/op
BenchmarkArrayGlobal      300000              4123 ns/op
BenchmarkSliceLocal       500000              3090 ns/op
BenchmarkArrayLocal       500000              3768 ns/op

分析结果:

访问全局切片比访问全局数组稍慢,这与我预期的一样:
4210 vs 4123 ns/op

Accessing the global slice is slightly slower than accessing the global array which is as I expected:
4210 vs 4123 ns/op

但是访问本地切片比访问本地数组要快得多:
3090 vs 3768 ns/op

But accessing the local slice is significantly faster than accessing the local array:
3090 vs 3768 ns/op

我的问题是:这是什么原因?

注意事项

我尝试改变以下内容,但没有改变结果:

I tried varying the following things but none changed the outcome:

  • 数组/切片的大小(尝试了 100、1000、10000)
  • 基准函数的顺序
  • 数组/切片的元素类型(尝试了byteint)

推荐答案

比较 amd64 程序集BenchmarkArrayLocalBenchmarkSliceLocal(太长,不适合这篇文章):

Comparing the amd64 assembly of both BenchmarkArrayLocal and BenchmarkSliceLocal (too long to fit in this post):

数组版本多次从内存中加载a的地址,实际上在每次数组访问操作中:

The array version loads the address of a from memory multiple times, practically on every array-access operation:

LEAQ    "".a+1000(SP),BX

而切片版本在从内存加载一次后仅在寄存器上计算:

Whereas the slice version is computing exclusively on registers after loading once from memory:

LEAQ    (DX)(SI*1),BX

这不是决定性的,但可能是原因.原因是这两种方法实际上是相同的.另一个值得注意的细节是数组版本调用 runtime.duffcopy,这是一个很长的汇编程序,而切片版本则没有.

This is not conclusive but probably the cause. Reason being that both methods are otherwise virtually identical. One other notable detail is that the array version calls into runtime.duffcopy, which is a quite long assembly routine, whereas the slice version doesn't.

这篇关于Array vs Slice:访问速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆