为什么线程块中的变量具有相同的内存地址?库达 [英] Why variables in thread's block have the same memory address? Cuda

查看:53
本文介绍了为什么线程块中的变量具有相同的内存地址?库达的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道为什么它们具有相同的内存地址,如果我没记错的话,每个线程都以这种方式拥有自己的创建变量副本:

I am wondering why they have the same memory address, when If I remember correctly, each thread has a own copy of created variable in this way:

__global__ void
Matrix_Multiplication_Shared(
   const int* const Matrix_A, 
   const int* const Matrix_B, 
         int* const Matrix_C)
{   
    const int sum_value = threadIdx.x;
    printf("%p \n", &sum_value);
}

输出:

我正在考虑一个线程的块的情况,例如具有2个或更多线程.

I am considering the case of one thread's block, for example with 2 and more threads.

推荐答案

NVIDIA GPU具有多个地址空间.

NVIDIA GPUs have multiple address spaces.

指针使用的主要虚拟地址空间称为通用地址空间.通用地址空间内部是用于本地内存和共享内存的窗口.通用地址空间的其余部分是全局地址空间.PTX和GPU指令集支持其他指令,以基于0的方式访问本地和共享内存地址空间.

The primary virtual address spaced used by pointers is called the generic address space. Inside the generic address space are windows for local memory and shared memory. The rest of the generic address space is the global address space. PTX and the GPU instruction set support additional instructions for 0 based access to the local and shared memory address space.

一些自动变量和堆栈内存在本地内存地址空间中.全局内存和本地内存之间的主要区别在于,本地内存的组织方式使得可以通过连续的线程ID访问连续的32位字.如果每个线程从相同的本地内存偏移量读取或写入,则内存访问将完全合并.

Some automatic variables and stack memory is in the local memory address space. The primary difference between global memory and the local memory is that local memory is organized such that consecutive 32-bit words are accessed by consecutive thread IDs. If each thread reads or writes from the same local memory offset then the memory access is fully coalesced.

在PTX中,本地内存是通过ld.local和st.local访问的.

In PTX local memory is accessed via ld.local and st.local.

在GPU SASS中,指令具有两种形式:

In GPU SASS the instructions have two forms:

  1. LDL,STL可直接访问以基于0的偏移量给出的本地内存
  2. LD,ST可用于通过通用本地内存窗口进行本地内存访问.

当获取变量的地址时,将返回通用地址空间地址.每个线程与通用本地内存窗口基本指针的偏移量相同.负载存储单元会将基于0的偏移量隐式化为每个线程的唯一全局地址.

When you take the address of the variable the generic address space address is returned. Each thread is seeing the same offset from the generic local memory window base pointer. The load store unit will covert the 0-based offset into to a unique per thread global address.

有关更多信息,请参见:

For more information see:

PTX ISA 通用寻址一节.本地内存的详细信息分散在手册中.

PTX ISA section on Generic Addressing. Details on local memory are scattered throughout the manual.

这篇关于为什么线程块中的变量具有相同的内存地址?库达的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆