专用内存是否比本地内存慢? [英] Is private memory slower than local memory?

查看:62
本文介绍了专用内存是否比本地内存慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个内核,该内核每个线程具有很多全局内存访问权限,因此我将它们复制到本地内存中,从而使速度提高了40%.

I was working on a kernel which had much global memory access per thread so I copied them to local memory which gave a speed up of 40%.

我希望进一步提高速度,以便将其从本地复制到私有状态,从而降低性能

I wanted still more speed up so copied from local to private which degraded the performance

所以我认为我们不能使用太多会降低性能的私有内存是正确的吗?

So is it correct that I think we must not use to much private memory which may degrade the performance?

推荐答案

Ashwin的回答是正确的方向,但有一点误导.

Ashwin's answer is in the right direction but a little misleading.

OpenCL从变量的物理存储中提取变量的地址空间,并且两者之间不一定存在1:1映射.

OpenCL abstracts the address space of variables away from their physical storage, and there is not necessarily a 1:1 mapping between the two.

考虑在__private地址空间中声明的OpenCL变量,默认情况下,该变量在函数内部包括自动非指针变量. NVidia GPU的实现将尽可能地在寄存器中进行物理分配,仅在寄存器容量不足时才溢出到物理片外存储器.这种特殊的片外存储器称为"CUDA本地"存储器,并且具有与分配给__global变量的存储器类似的性能特征,这说明了由于寄存器溢出而导致的性能损失.在此实现中,没有像私有内存"这样的物理事物,只有私有地址空间",可以在芯片上或芯片外分配.

Consider OpenCL variables declared in the __private address space, which includes automatic non-pointer variables inside functions by default. The NVidia GPU implementation will physically allocate these in registers as far as possible, only spilling over to physical off-chip memory when there is insufficient register capacity. This particular off-chip memory is called "CUDA local" memory, and has similar performance characteristics to memory allocated for __global variables, which explains the performance penalty due to register spill-over. There is no such physical thing as "private memory" in this implementation, only a "private address space", which may be allocated on- or off-chip.

性能下降不是使用专用地址空间(或专用内存")的直接结果,该地址通常在高性能内存中分配.这是因为在此实现中,该变量太大而无法分配给高性能寄存器,因此被溢出"到了片外存储器中.

The performance hit is not a direct consequence of using the private address space (or "private memory"), which is typically allocated in high performance memory. It is because, under this implementation, the variable was too large to be allocated on high performance registers, and was therefore "spilled over" to off-chip memory.

这篇关于专用内存是否比本地内存慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆