阵列vs切片:访问速度 [英] Array vs Slice: accessing speed

查看:164
本文介绍了阵列vs切片:访问速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是关于访问元素的速度,而不是关于将它们作为参数传递给函数的效率。



在大多数情况下,我希望数组切片快,因为切片是描述数组的连续部分的数据结构,因此可能需要一个额外的步骤当访问一个切片的元素(间接地是它底层数组的元素)。

所以我写了一个测试来测试一批简单的操作。有4个基准函数,前2个测试全局切片和全局数组,另外2个测试本地切片和本地数组:

  var gs = make([] byte,1000)//全局切片
var ga [1000]字节//全局数组

func BenchmarkSliceGlobal(b * testing.B){
for i:= 0;我< b.N; i ++ {
for j,v:= range gs {
gs [j] ++; gs [j] = gs [j] + v + 10; gs [j] + = v
}
}
}

func BenchmarkArrayGlobal(b * testing.B){
for i:= 0;我< b.N; i ++ {
for j,v:= range ga {
ga [j] ++; ga [j] = ga [j] + v + 10; ga [j] + = v
}
}
}

func BenchmarkSliceLocal(b * tests.B){
var s = make([ ] byte,1000)
for i:= 0;我< b.N; i ++ {
for j,v:= range s {
s [j] ++; s [j] = s [j] + v + 10; s [j] + = v
}
}
}

func BenchmarkArrayLocal(b * testing.B){
var a [1000] byte
for i:= 0;我< b.N; i ++ {
for j,v:= range a {
a [j] ++; a [j] = a [j] + v + 10; a [j] + = v
}
}
}

我多次运行测试,这是典型的输出( go test -bench。* ):

  BenchmarkSliceGlobal 300000 4210 ns / op 
BenchmarkArrayGlobal 300000 4123 ns / op
BenchmarkSliceLocal 500000 3090 ns / op
BenchmarkArrayLocal 500000 3768 ns / op

分析结果

访问全局片比访问全局数组要慢一些,如我所料:

4210 vs 4123 ns / op



但访问本地片比访问本地数组要快得多:

3090 vs 3768 ns / op



我的问题是:什么是原因是什么?



笔记

以下的事情,但没有改变结果:


  • 数组/切片的大小(尝试过100,1000,10000)

  • 基准函数的顺序
  • 数组/切片的元素类型(尝试 byte int


解决方案

比较 BenchmarkArrayLocal > BenchmarkSliceLocal (太长而不适合这篇文章):

数组版本加载 a <
$ b

  LEAQ.a + 1000(code>)从内存中多次读取,实际上在每个数组访问操作中: SP),BX 

尽管从存储器加载一次后, p>

  LEAQ(DX)(SI * 1),BX 

这不是确定的,但可能是第四因为。原因在于两种方法实际上都是相同的。另一个值得注意的细节是数组版本调用到 runtime.duffcopy ,这是一个相当长的汇编程序,而切片版本没有。


This question is about the speed of accessing elements of arrays and slices, not about the efficiency of passing them to functions as arguments.

I would expect arrays to be faster than slices in most cases because a slice is a data structure describing a contiguous section of an array and so there may be an extra step involved when accessing elements of a slice (indirectly the elements of its underlying array).

So I wrote a little test to benchmark a batch of simple operations. There are 4 benchmark functions, the first 2 test a global slice and a global array, the other 2 test a local slice and a local array:

var gs = make([]byte, 1000) // Global slice
var ga [1000]byte           // Global array

func BenchmarkSliceGlobal(b *testing.B) {
    for i := 0; i < b.N; i++ {
        for j, v := range gs {
            gs[j]++; gs[j] = gs[j] + v + 10; gs[j] += v
        }
    }
}

func BenchmarkArrayGlobal(b *testing.B) {
    for i := 0; i < b.N; i++ {
        for j, v := range ga {
            ga[j]++; ga[j] = ga[j] + v + 10; ga[j] += v
        }
    }
}

func BenchmarkSliceLocal(b *testing.B) {
    var s = make([]byte, 1000)
    for i := 0; i < b.N; i++ {
        for j, v := range s {
            s[j]++; s[j] = s[j] + v + 10; s[j] += v
        }
    }
}

func BenchmarkArrayLocal(b *testing.B) {
    var a [1000]byte
    for i := 0; i < b.N; i++ {
        for j, v := range a {
            a[j]++; a[j] = a[j] + v + 10; a[j] += v
        }
    }
}

I ran the test multiple times, here is the typical output (go test -bench .*):

BenchmarkSliceGlobal      300000              4210 ns/op
BenchmarkArrayGlobal      300000              4123 ns/op
BenchmarkSliceLocal       500000              3090 ns/op
BenchmarkArrayLocal       500000              3768 ns/op

Analyzing the results:

Accessing the global slice is slightly slower than accessing the global array which is as I expected:
4210 vs 4123 ns/op

But accessing the local slice is significantly faster than accessing the local array:
3090 vs 3768 ns/op

My question is: What is the reason for this?

Notes

I tried varying the following things but none changed the outcome:

  • the size of the array/slice (tried 100, 1000, 10000)
  • the order of the benchmark functions
  • the element type of the array/slice (tried byte and int)

解决方案

Comparing the amd64 assembly of both BenchmarkArrayLocal and BenchmarkSliceLocal (too long to fit in this post):

The array version loads the address of a from memory multiple times, practically on every array-access operation:

LEAQ    "".a+1000(SP),BX

Whereas the slice version is computing exclusively on registers after loading once from memory:

LEAQ    (DX)(SI*1),BX

This is not conclusive but probably the cause. Reason being that both methods are otherwise virtually identical. One other notable detail is that the array version calls into runtime.duffcopy, which is a quite long assembly routine, whereas the slice version doesn't.

这篇关于阵列vs切片:访问速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆