去:多个len()调用与性能? [英] Go: multiple len() calls vs performance?

查看:52
本文介绍了去:多个len()调用与性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我正在实现一些排序算法.由于这是算法的本质,因此使用 len()方法对许多数组/切片的长度进行了很多调用.

At the moment I am implementing some sorting algorithms. As it's in the nature of algorithms, there are a lot of calls on the length of some arrays/slices using the len() method.

现在,为Mergesort算法(的一部分)提供以下代码:

Now, given the following code for a (part of) the Mergesort algorithm:

  for len(left) > 0 || len(right) > 0 {
        if len(left) > 0 && len(right) > 0 {
            if left[0] <= right[0] {
                result = append(result, left[0])
                left = left[1:len(left)]
            } else {
                result = append(result, right[0])
                right = right[1:len(right)]
            }
        } else if len(left) > 0 {
            result = append(result, left[0])
            left = left[1:len(left)]
        } else if len(right) > 0 {
            result = append(result, right[0])
            right = right[1:len(right)]
        }
    }

我的问题是:这些多个len()调用是否会对算法的性能产生负面影响?为 right left 切片的长度制作一个临时变量更好吗?还是编译器自己执行此操作?

My question is: Do these multiple len() calls affect the performance of the algorithm negatively? Is it better to make a temporary variable for the length of the right and left slice? Or does the compiler does this itself?

推荐答案

有两种情况:

  • 本地切片:长度将被缓存并且没有开销
  • 全局切片或通过引用传递(通过引用):无法缓存长度并且存在开销
  • Local slice: length will be cached and there is no overhead
  • Global slice or passed (by reference): length cannot be cached and there is overhead

对于本地定义的片,长度被缓存,因此没有运行时开销.您可以在以下程序的程序集中看到这一点:

For locally defined slices the length is cached, so there is no runtime overhead. You can see this in the assembly of the following program:

func generateSlice(x int) []int {
    return make([]int, x)
}

func main() {
    x := generateSlice(10)
    println(len(x))
}

使用 go工具6g -S test.go 进行编译,除其他外,这将产生以下几行:

Compiled with go tool 6g -S test.go this yields, amongst other things, the following lines:

MOVQ    "".x+40(SP),BX
MOVQ    BX,(SP)
// ...
CALL    ,runtime.printint(SB)

这里发生的是,第一行通过获取距 x 开头40个字节的值来检索 x 的长度,最重要的是将此值缓存在 BX ,然后用于每次出现 len(x)的情况.偏移量的原因是数组具有以下结构(

What happens here is that the first line retrieves the length of x by getting the value located 40 bytes from the beginning of x and most importantly caches this value in BX, which is then used for every occurrence of len(x). The reason for the offset is that an array has the following structure (source):

typedef struct
{               // must not move anything
    uchar   array[8];   // pointer to data
    uchar   nel[4];     // number of elements
    uchar   cap[4];     // allocated number of elements
} Array;

nel len()所访问的内容.您可以在代码生成.

nel is what is accessed by len(). You can see this in the code generation as well.

对于共享值,长度的缓存是不可能的,因为编译器必须假定切片在两次调用之间发生变化.因此,编译器必须编写每次都直接访问length属性的代码.示例:

For shared values caching of the length is not possible since the compiler has to assume that the slice changes between calls. Therefore the compiler has to write code that accesses the length attribute directly every time. Example:

func accessLocal() int {
    a := make([]int, 1000) // local
    count := 0
    for i := 0; i < len(a); i++ {
        count += len(a)
    }
    return count
}

var ag = make([]int, 1000) // pseudo-code

func accessGlobal() int {
    count := 0
    for i := 0; i < len(ag); i++ {
        count += len(ag)
    }
    return count
}

比较这两个函数的汇编产生了一个关键的区别,那就是,一旦变量是全局变量,就不再缓存对 nel 属性的访问,并且会产生运行时开销:

Comparing the assembly of both functions yields the crucial difference that as soon as the variable is global the access to the nel attribute is not cached anymore and there will be a runtime overhead:

// accessLocal
MOVQ    "".a+8048(SP),SI // cache length in SI
// ...
CMPQ    SI,AX            // i < len(a)
// ...
MOVQ    SI,BX
ADDQ    CX,BX
MOVQ    BX,CX            // count += len(a)

// accessGlobal
MOVQ    "".ag+8(SB),BX
CMPQ    BX,AX            // i < len(ag)
// ...
MOVQ    "".ag+8(SB),BX
ADDQ    CX,BX
MOVQ    BX,CX            // count += len(ag)

这篇关于去:多个len()调用与性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆