追加切片糟糕的表现..为什么? [英] Appending to slice bad performance.. why?

查看:193
本文介绍了追加切片糟糕的表现..为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用GoLang创建一款游戏。我正在测量FPS。我注意到使用for循环追加到一个切片7 fps损失,如下所示:

  vertexInfo:= Opengl。 OpenGLVertexInfo {} 

for i:= 0;我< 4; i = i + 1 {
vertexInfo.Translations = append(vertexInfo.Translations,float32(sx),float32(sy),0)
vertexInfo.Rotations = append(vertexInfo.Rotations,0,0, 1,s.rot)
vertexInfo.Scales = append(vertexInfo.Scales,s.xS,s.yS,0)
vertexInfo.Colors = append(vertexInfo.Colors,sr,sg,sb, sa)

}

我为每个精灵都做了这个画。问题是为什么只有循环播放时间才能获得如此巨大的性能,并将相同的内容附加到这些片上?有没有更有效的方法来做到这一点?这不像我添加旺盛的数据量。每片包含约16个元素,如上所示(4 x 4)。

当我将所有16个元素放入一个 [] float32 {1..16} 时,fps得到了改善约为4.



更新:我对每个追加进行了基准测试,似乎每个人都需要1 fps才能执行。似乎很多考虑到这个数据是相当静态的..我只需要4次迭代...



更新:添加github repo https://github.com/Triangle345/GT

解决方案

内置的 append() 需要创建一个新的后备数组,如果目标切片的容量小于追加后切片的长度。这还需要将当前元素从目标复制到新分配的数组,因此会有很多开销。



您添加的片段最有可能是空片段,因为您使用了切片文字创建您的 Opengl.OpenGLVertexInfo 值。尽管 append()会考虑将来并分配一个比追加指定元素所需要的数组更大的数组,但在您的情况下,您可能需要多次重新分配才能完成如果你创建并初始化 vertexInfo ,像这样:

  vertexInfo:= Opengl.OpenGLVertexInfo {
翻译:[] float32 {float32(sx),float32(sy),0,float32( sx),float32(sy),0,float32(sx),float32(sy),0,float32(sx),float32(sy),0},
Rotations:[] float64 {0,0,1 ,s.rot,0,0,1,s.rot,0,0,1,s.rot,0,0,1,s.rot},
Scales:[] float64 {s.xS, s.yS,0,s.xS,s.yS,0,s.xS,s.yS,0,s.xS,s.yS,0},
颜色:[] float64 {sr,sg ,sb,sa,sr,sg,sb,sa,sr,sg,sb,sa,sr,sg,sb,sa},
}

另外请注意,这个结构文字将会照顾到不必重新分配阵列在切片后面。但是,如果在代码的其他地方(我们没有看到)将更多元素附加到这些片上,它们可能会导致重新分配。如果是这种情况,你应该创建更大容量的片来覆盖未来的分配(例如 make([] float64,16,32))。 b $ b

I'm currently creating a game using GoLang. I'm measuring the FPS. I'm noticing about a 7 fps loss using a for loop to append to a slice like so:

vertexInfo := Opengl.OpenGLVertexInfo{}

for i := 0; i < 4; i = i + 1 {
    vertexInfo.Translations = append(vertexInfo.Translations, float32(s.x), float32(s.y), 0)
    vertexInfo.Rotations = append(vertexInfo.Rotations, 0, 0, 1, s.rot)
    vertexInfo.Scales = append(vertexInfo.Scales, s.xS, s.yS, 0)
    vertexInfo.Colors = append(vertexInfo.Colors, s.r, s.g, s.b, s.a)

}

I'm doing this for every sprite, every draw. The question is why do I get such a huge performance hit with just looping for times and appending the same thing to these slices? Is there a more efficient way to do this? It is not like I'm adding exuberant amount of data. Each slice contains about 16 elements as shown above (4 x 4).

When I simply put all 16 elements in one []float32{1..16} then fps is improved by about 4.

Update: I benchmarked each append and it seems that each one takes 1 fps to perform.. That seems like a lot considering this data is pretty static.. I only need 4 iterations...

Update: Added github repo https://github.com/Triangle345/GT

解决方案

The builtin append() needs to create a new backing array if the capacity of the destination slice is less than what the length of the slice would be after the append. This also requires to copy the current elements from destination to the newly allocated array, so there are much overhead.

Slices you append to are most likely empty slices since you used a slice literal to create your Opengl.OpenGLVertexInfo value. Even though append() thinks for the future and allocates a bigger array than what is needed to append the specified elements, chances are that in your case multiple reallocations will be needed to complete the 4 iterations.

You may avoid reallocations if you create and initialize vertexInfo like this:

vertexInfo := Opengl.OpenGLVertexInfo{
    Translations: []float32{float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0, float32(s.x), float32(s.y), 0},
    Rotations:    []float64{0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot, 0, 0, 1, s.rot},
    Scales:       []float64{s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0, s.xS, s.yS, 0},
    Colors:       []float64{s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a, s.r, s.g, s.b, s.a},
}

Also note that this struct literal will take care of not having to reallocate arrays behind the slices. But if in other places of your code (which we don't see) you append further elements to these slices, they may cause reallocations. If this is the case, you should create slices with bigger capacity covering "future" allocations (e.g. make([]float64, 16, 32)).

这篇关于追加切片糟糕的表现..为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆