附加到切片性能不佳.. 为什么? [英] Appending to slice bad performance.. why?

查看:23
本文介绍了附加到切片性能不佳.. 为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用 GoLang 创建游戏.我正在测量 FPS.我注意到使用 for 循环附加到切片会导致 7 fps 的损失,如下所示:

vertexInfo := Opengl.OpenGLVertexInfo{}对于我:= 0;我<4;我 = 我 + 1 {vertexInfo.Translations = append(vertexInfo.Translations, float32(s.x), float32(s.y), 0)vertexInfo.Rotations = append(vertexInfo.Rotations, 0, 0, 1, s.rot)vertexInfo.Scales = append(vertexInfo.Scales, s.xS, s.yS, 0)vertexInfo.Colors = append(vertexInfo.Colors, s.r, s.g, s.b, s.a)}

我为每个精灵、每次抽奖都这样做.问题是为什么我只需要循环多次并将相同的内容附加到这些切片中,就会得到如此巨大的性能损失?有没有更有效的方法来做到这一点?这不像我要添加大量的数据.每个切片包含大约 16 个元素,如上所示 (4 x 4).

当我简单地将所有 16 个元素放在一个 []float32{1..16} 中时,fps 提高了大约 4.

更新:我对每个附加进行了基准测试,似乎每个附加都需要 1 fps 来执行..考虑到这些数据非常静态,这似乎很多..我只需要 4 次迭代......

更新:添加了 github 存储库 https://github.com/Triangle345/GT

解决方案

内置 append() 如果目标切片的容量小于附加后切片的长度,则需要创建一个新的后备数组.这也需要将当前元素从目的地复制到新分配的数组中,所以开销很大.

您附加到的切片很可能是空切片,因为您使用切片文字来创建您的 Opengl.OpenGLVertexInfo 值.即使 append() 为未来考虑并分配比附加指定元素所需的更大的数组,但在您的情况下,可能需要多次重新分配才能完成 4 次迭代.>

如果您像这样创建和初始化 vertexInfo,您可以避免重新分配:

vertexInfo := Opengl.OpenGLVertexInfo{翻译: []float32{float32(sx), float32(sy), 0, float32(sx), float32(sy), 0, float32(sx), float32(sy), 0, float32(sx), float32(sy)), 0},旋转:[]float64{0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot},比例尺:[]float64{s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0},颜色:[]float64{s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a},}

另请注意,此结构文字将负责不必重新分配切片后面的数组.但是,如果在代码的其他地方(我们没有看到),您将更多元素附加到这些切片中,它们可能会导致重新分配.如果是这种情况,您应该创建覆盖未来"分配的更大容量的切片(例如 make([]float64, 16, 32)).

I'm currently creating a game using GoLang. I'm measuring the FPS. I'm noticing about a 7 fps loss using a for loop to append to a slice like so:

vertexInfo := Opengl.OpenGLVertexInfo{}

for i := 0; i < 4; i = i + 1 {
    vertexInfo.Translations = append(vertexInfo.Translations, float32(s.x), float32(s.y), 0)
    vertexInfo.Rotations = append(vertexInfo.Rotations, 0, 0, 1, s.rot)
    vertexInfo.Scales = append(vertexInfo.Scales, s.xS, s.yS, 0)
    vertexInfo.Colors = append(vertexInfo.Colors, s.r, s.g, s.b, s.a)

}

I'm doing this for every sprite, every draw. The question is why do I get such a huge performance hit with just looping for times and appending the same thing to these slices? Is there a more efficient way to do this? It is not like I'm adding exuberant amount of data. Each slice contains about 16 elements as shown above (4 x 4).

When I simply put all 16 elements in one []float32{1..16} then fps is improved by about 4.

Update: I benchmarked each append and it seems that each one takes 1 fps to perform.. That seems like a lot considering this data is pretty static.. I only need 4 iterations...

Update: Added github repo https://github.com/Triangle345/GT

解决方案

The builtin append() needs to create a new backing array if the capacity of the destination slice is less than what the length of the slice would be after the append. This also requires to copy the current elements from destination to the newly allocated array, so there are much overhead.

Slices you append to are most likely empty slices since you used a slice literal to create your Opengl.OpenGLVertexInfo value. Even though append() thinks for the future and allocates a bigger array than what is needed to append the specified elements, chances are that in your case multiple reallocations will be needed to complete the 4 iterations.

You may avoid reallocations if you create and initialize vertexInfo like this:

vertexInfo := Opengl.OpenGLVertexInfo{
    Translations: []float32{float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0},
    Rotations:    []float64{0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot},
    Scales:       []float64{s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0},
    Colors:       []float64{s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a},
}

Also note that this struct literal will take care of not having to reallocate arrays behind the slices. But if in other places of your code (which we don't see) you append further elements to these slices, they may cause reallocations. If this is the case, you should create slices with bigger capacity covering "future" allocations (e.g. make([]float64, 16, 32)).

这篇关于附加到切片性能不佳.. 为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆