为什么Cuda运行时在初始化时保留80 GiB虚拟内存? [英] Why does the Cuda runtime reserve 80 GiB virtual memory upon initialization?

查看:263
本文介绍了为什么Cuda运行时在初始化时保留80 GiB虚拟内存?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在剖析我的Cuda 4程序,事实证明,在某些阶段,运行过程使用超过80 GiB的虚拟内存。这比我想象的要多得多。
在检查内存映射随时间的演变并比较它执行的代码行后,发现在这些简单的指令之后,虚拟内存使用率达到80 GiB以上:

  int deviceCount; 
cudaGetDeviceCount(& deviceCount);
if(deviceCount == 0){
perror(没有设备支持CUDA);
}

显然,这是第一个Cuda调用,因此运行时被初始化。之后,内存映射看起来像(截断):

 地址Kbytes RSS脏模式映射
0000000000400000 89796 14716 0 rx - prg
0000000005db1000 12 12 8 rw --- prg
0000000005db4000 80 76 76 rw --- [anon]
0000000007343000 39192 37492 37492 rw --- [anon]
0000000200000000 4608 0 0 ----- [anon]
0000000200480000 1536 1536 1536 rw --- [anon]
0000000200600000 83879936 0 0 ----- [anon]



现在,这个庞大的内存区域映射到虚拟内存空间。



,它可能不是一个大问题,因为在Linux中保留/分配内存不做太多,除非你实际写入这个内存。但是它真的很讨厌,因为例如MPI作业必须指定与作业可用的最大量的vmem。和80GiB这只是一个较低的边界然后对于Cuda工作 - 一个也必须添加所有其他的东西。



我可以想象,它与所谓的Cuda维护的临时空间。一种内存池,用于可以动态增长和缩减的内核代码。但这是猜测。

解决方案



<与临时空间无关,它是寻址系统的结果,允许在主机和多个GPU之间进行统一压缩和对等访问。 CUDA驱动程序使用内核的虚拟内存系统在单个虚拟地址空间中注册所有GPU内存+主机内存。它实际上不是内存消耗本身,它只是一个把戏将所有可用的地址空间映射到线性虚拟空间,用于统一寻址。


I was profiling my Cuda 4 program and it turned out that at some stage the running process used over 80 GiB of virtual memory. That was a lot more than I would have expected. After examining the evolution of the memory map over time and comparing what line of code it is executing it turned out that after these simple instructions the virtual memory usage bumped up to over 80 GiB:

  int deviceCount;
  cudaGetDeviceCount(&deviceCount);
  if (deviceCount == 0) {
    perror("No devices supporting CUDA");
  }

Clearly, this is the first Cuda call, thus the runtime got initialized. After this the memory map looks like (truncated):

Address           Kbytes     RSS   Dirty Mode   Mapping
0000000000400000   89796   14716       0 r-x--  prg
0000000005db1000      12      12       8 rw---  prg
0000000005db4000      80      76      76 rw---    [ anon ]
0000000007343000   39192   37492   37492 rw---    [ anon ]
0000000200000000    4608       0       0 -----    [ anon ]
0000000200480000    1536    1536    1536 rw---    [ anon ]
0000000200600000 83879936       0       0 -----    [ anon ]

Now with this huge memory area mapped into virtual memory space.

Okay, its maybe not a big problem since reserving/allocating memory in Linux doesn't do much unless you actually write to this memory. But it's really annoying since for example MPI jobs have to be specified with the maximum amount of vmem usable by the job. And 80GiB that's s just a lower boundary then for Cuda jobs - one has to add all other stuff too.

I can imagine that it has to do with the so-called scratch space that Cuda maintains. A kind of memory pool for kernel code that can dynamically grow and shrink. But that's speculation. Also it's allocated in device memory.

Any insights?

解决方案

Nothing to do with scratch space, it is the result of the addressing system that allows unified andressing and peer to peer access between host and multiple GPUs. The CUDA driver registers all the GPU(s) memory + host memory in a single virtual address space using the kernel's virtual memory system. It isn't actually memory consumption, per se, it is just a "trick" to map all the available address spaces into a linear virtual space for unified addressing.

这篇关于为什么Cuda运行时在初始化时保留80 GiB虚拟内存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆