CUDA和固定(页锁定)内存没有页锁定? [英] CUDA and pinned (page locked) memory not page locked at all?

查看:238
本文介绍了CUDA和固定(页锁定)内存没有页锁定?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出CUDA(或OpenCL实现)是否告诉真相,当我需要固定(页面锁定)内存。



我尝试了 cudaMallocHost ,并查看了 / proc / meminfo Mlocked 不可抗拒,都保持为0,从不上升( / proc /< pid> / status reports VmLck 也为0)。我使用 mlock 来锁定内存,值按预期增加。



这种行为的两个可能原因可能是:


  1. 从CUDA API获取页锁定内存,并且cudaSuccess是伪造的

  2. CUDA绕过了页锁定内存的操作系统计数器,因为CUDA对linux内核有一些魔法

所以实际的问题是:当我使用CUDA分配页面锁定内存时,为什么无法从操作系统获取页面锁定内存的值?



另外:如果没有从 / proc / meminfo / proc /< pid> / status



谢谢!



系统:
Ubuntu 14.04.01 LTS; CUDA 6.5; Nvidida Driver 340.29; Nvidia Tesla K20c

解决方案

看起来CUDA 6.5上的固定分配器使用 mmap ()和MAP_FIXED。虽然我不是操作系统专家,但我相信这会产生锁定内存的效果,即确保其地址永远不会改变。



让我们考虑一个简短的测试程序:

  #include< stdio.h> 
#define DSIZE(1048576 * 1024)

int main(){

int * data;
cudaFree(0);
system(cat / proc / meminfo> out1.txt);
printf(* $ * before alloc\\\
);
cudaHostAlloc(& data,DSIZE,cudaHostAllocDefault);
printf(* $ * after alloc\\\
);
system(cat / proc / meminfo> out2.txt);
cudaFreeHost(data);
system(cat / proc / meminfo> out3.txt);
return 0;
}

如果我们使用 strace ,并在 printf 语句之间输出部分,我们有:

  write(1,* $ * before alloc\\\
,16 * $ * before alloc)= 16
mmap(0x204500000,1073741824,PROT_READ | PROT_WRITE,MAP_SHARED | MAP_FIXED | MAP_ANONYMOUS,0 ,0)= 0x204500000
ioctl(11,0xc0304627,0x7fffcf72cce0)= 0
ioctl(3,0xc0384657,0x7fffcf72cd70)= 0
write(1,* $ * ,15 * $ * after alloc)= 15

(注意,1073741824正好是一个GB与请求的1048576 * 1024相同)



查看 的说明,我们有:


地址给出映射的优选开始地址。 NULL表示没有偏好。该地址上的任何以前的映射都会自动删除。您提供的地址可能仍会更改,除非您使用MAP_FIXED标志。


因此,假设 mmap 命令成功,



这种机制显然不使用 mlock(),因此,mlock'ed页面不会改变,前后。但是我们希望映射统计数据发生变化,如果我们对上面程序产生的out1.txt和out2.txt进行区分,我们可以看到(摘录):

 <映射:87488 kB 
---
>映射:1135904 kB

差别在于大约一千兆字节,请求的固定内存量。 / p>

I try to figure out if CUDA (or the OpenCL implementation) tells the truth when I require pinned (page locked) memory.

I tried cudaMallocHost and looked at the /proc/meminfo values Mlocked and Unevictable , both stay at 0 and never go up (/proc/<pid>/status reports VmLck also as 0). I used mlock to page lock memory and the values go up as expected.

So two possible reasons for this behavior might be:

  1. I don't get page locked memory from the CUDA API and the cudaSuccess is a fake
  2. CUDA bypasses the OS counters for page locked memory because CUDA does some magic with the linux kernel

So the actual question is: Why can’t I get the values for page locked memory from the OS when I use CUDA to allocate page locked memory?

Additionally: Where can I get the right values if not from /proc/meminfo or /proc/<pid>/status?

Thanks!

System: Ubuntu 14.04.01 LTS; CUDA 6.5; Nvidida Driver 340.29; Nvidia Tesla K20c

解决方案

It would seem that the pinned allocator on CUDA 6.5 under the hood is using mmap() with MAP_FIXED. Although I am not an OS expert, I believe this will have the effect of "pinning" memory, i.e. ensuring that its address never changes.

Let's consider a short test program:

#include <stdio.h>
#define DSIZE (1048576*1024)

int main(){

  int *data;
  cudaFree(0);
  system("cat /proc/meminfo > out1.txt");
  printf("*$*before alloc\n");
  cudaHostAlloc(&data, DSIZE, cudaHostAllocDefault);
  printf("*$*after alloc\n");
  system("cat /proc/meminfo > out2.txt");
  cudaFreeHost(data);
  system("cat /proc/meminfo > out3.txt");
  return 0;
}

If we run this program with strace, and excerpt the output part between the printf statements, we have:

write(1, "*$*before alloc\n", 16*$*before alloc)       = 16
mmap(0x204500000, 1073741824, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED|MAP_ANONYMOUS, 0, 0) = 0x204500000
ioctl(11, 0xc0304627, 0x7fffcf72cce0)   = 0
ioctl(3, 0xc0384657, 0x7fffcf72cd70)    = 0
write(1, "*$*after alloc\n", 15*$*after alloc)        = 15

(note that 1073741824 is exactly one gigabyte, i.e. the same as the requested 1048576*1024)

Reviewing the description of mmap, we have:

address gives a preferred starting address for the mapping. NULL expresses no preference. Any previous mapping at that address is automatically removed. The address you give may still be changed, unless you use the MAP_FIXED flag.

Therefore, assuming the mmap command is successful, the memory address requested will be fixed, and therefore the memory is "pinned".

This mechanism apparently does not use mlock(), and so the mlock'ed pages don't change, before and after. However we would expect a change in the mapping statistic, and if we diff the out1.txt and out2.txt produced by the above program, we see (excerpted):

< Mapped:            87488 kB
---
> Mapped:          1135904 kB

The difference is approximately a gigabyte, the amount of "pinned" memory requested.

这篇关于CUDA和固定(页锁定)内存没有页锁定?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆