OpenCL clCreateContextFromType函数导致内存泄漏 [英] OpenCL clCreateContextFromType function results in memory leaks

查看:160
本文介绍了OpenCL clCreateContextFromType函数导致内存泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对我的一个开源OpenCL代码运行了 valgrind (

我将针对如何使用NVIDIA OpenCL减少这种内存泄漏提出一个新的问题和悬赏.如果您有任何经验,请分享.将发布以下链接.谢谢

解决方案

我仔细检查了我所有的OpenCL变量,命令队列,内核和程序,并确保它们均已正确发布...

我仍然在mmc代码中发现了一个(微小的)内存泄漏:

  == 15320 == 1个块中的8个字节肯定在1,905的丢失记录14中丢失== 15320 ==在0x4C2FB0F:malloc(在/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)== 15320 ==通过0x128D48:mmc_run_cl(mmc_cl_host.c:137)== 15320 ==通过0x11E71E:main(mmc.c:67) 

greporter 分配的内存无法释放.因此,这将由您解决.

其余是OpenCL库中潜在的内存泄漏.它们可能是(也可能不是)内存泄漏,例如,库可能使用了 valgrind 无法识别或执行其他一些技巧的自定义内存分配器.有很多关于这个的话题:

通常,除非您想深入研究库代码并为此做些事情,否则您将无法对此做很多事情.我建议谨慎地禁止那些来自图书馆的报道.可以如valgrind手册中所述生成禁止文件: https://valgrind.org/docs/manual/manual-core.html#manual-core.suppress

...但是,在对示例程序进行测试时,每次调用mmc_run_cl()函数将内存增加300MB-400MB,并且不会释放返回时

您如何检查的?我还没有看到可疑的内存增长.我设置了 -n 1000e4 ,它运行了大约2分钟,其中分配的内存一直保持在RAM大小的〜0.6%左右.请注意,我没有使用 nvidia CUDA ,而是在Intel GPU和CPU上使用了 POCL ,并链接了从 ocl-icd-安装的 libOpenCL Ubuntu 18.04上的libopencl1:amd64 软件包.因此,您可以尝试一下,检查是否有任何改变.

========更新=============================>

我已按照注释中的说明重新运行它,第一次迭代后内存使用率为0.6%,然后在第二次迭代后内存使用率增加到0.9%,此后下一次迭代并没有增加内存使用率.除了我之前观察到的内容外,Valgrind也没有报告任何新内容.因此,我建议使用不同于nvidia-cuda的libOpenCL进行链接并重新测试.

I ran valgrind to one of my open-source OpenCL codes (https://github.com/fangq/mmc), and it detected a lot of memory leaks in the OpenCL host code. Most of those pointed back to the line where I created the context object using clCreateContextFromType.

I double checked all my OpenCL variables, command queues, kernels and programs, and made sure that they are all properly released, but still, when testing on sample programs, every call to the mmc_run_cl() function bumps up memory by 300MB-400MB and won't release at return.

you can reproduce the valgrind report by running the below commands in a terminal:

git clone https://github.com/fangq/mmc.git
cd mmc/src
make clean
make all
cd ../examples/validation
valgrind --show-leak-kinds=all --leak-check=full ../../src/bin/mmc -f cube2.inp -G 1 -s cube2 -n 1e4 -b 0 -D TP -M G -F bin

assuming you system has gcc/git/libOpenCL and valgrind installed. Change the -G 1 input to a different number if you want to run it on other OpenCL devices (add -L to list).

In the below table, I list the repeated count of each valgrind detected leaks on an NVIDIA GPU (TitanV) on a Linux box (Ubuntu 16.04) with the latest driver+cuda 9.

Again, most leaks are associated with the clCreateContextFromType line, which I assume some GPU memories are not released, but I did released all GPU resources at the end of the host code.

do you notice anything that I missed in my host code? your input is much appreciated

counts |        error message
------------------------------------------------------------------------------------
    380 ==27828==    by 0x402C77: main (mmc.c:67)
Code: entry point to the below errors

     64 ==27828==    by 0x41CF02: mcx_list_gpu (mmc_cl_utils.c:135)
Code: OCL_ASSERT((clGetPlatformIDs(0, NULL, &numPlatforms)));

      4 ==27828==    by 0x41D032: mcx_list_gpu (mmc_cl_utils.c:154)
Code: context=clCreateContextFromType(cps,devtype[j],NULL,NULL,&status);

     58 ==27828==    by 0x41DF8A: mmc_run_cl (mmc_cl_host.c:111)
Code: entry point to the below errors

    438 ==27828==    by 0x41E006: mmc_run_cl (mmc_cl_host.c:124)
Code: OCL_ASSERT(((mcxcontext=clCreateContextFromType(cprops,CL_DEVICE_TYPE_ALL,...));

     13 ==27828==    by 0x41E238: mmc_run_cl (mmc_cl_host.c:144)
Code: OCL_ASSERT(((mcxqueue[i]=clCreateCommandQueue(mcxcontext,devices[i],prop,&status),status)));

      1 ==27828==    by 0x41E7A6: mmc_run_cl (mmc_cl_host.c:224)
Code:  OCL_ASSERT(((gprogress[0]=clCreateBufferNV(mcxcontext,CL_MEM_READ_WRITE, NV_PIN, ...);

      1 ==27828==    by 0x41E7F9: mmc_run_cl (mmc_cl_host.c:225)
Code: progress = (cl_uint *)clEnqueueMapBuffer(mcxqueue[0], gprogress[0], CL_TRUE, ...);

     10 ==27828==    by 0x41EDFA: mmc_run_cl (mmc_cl_host.c:290)
Code: status=clBuildProgram(mcxprogram, 0, NULL, opt, NULL, NULL);

      7 ==27828==    by 0x41F95C: mmc_run_cl (mmc_cl_host.c:417)
Code: OCL_ASSERT((clEnqueueReadBuffer(mcxqueue[devid],greporter[devid],CL_TRUE,0,...));

Update [04/11/2020]:

Reading @doqtor's comment, I did the following test on 5 difference devices, 2 NVIDIA GPUs, 2 AMD GPUs and 1 Intel CPU. What he said was correct - the memory leak does not happen on the Intel OpenCL library, I also found that AMD OpenCL driver is fine too. The only problem is that NVIDIA OpenCL library seems to have a leak on both GPUs I tested (Titan V and RTX2080).

My test results are below. Memory/CPU profiling using psrecord introduced in this post.

I will open a new question and bounty on how to reduce this memory leak with NVIDIA OpenCL. If you have any experience in this, please share. will post the link below. thanks

解决方案

I double checked all my OpenCL variables, command queues, kernels and programs, and made sure that they are all properly released...

Well I still found one (tiny) memory leak in mmc code:

==15320== 8 bytes in 1 blocks are definitely lost in loss record 14 of 1,905
==15320==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15320==    by 0x128D48: mmc_run_cl (mmc_cl_host.c:137)
==15320==    by 0x11E71E: main (mmc.c:67)

Memory allocated by greporter isn't freed. So that's to be fixed by you.

The rest are potential memory leaks in OpenCL library. They may or may not to be a memory leaks as for example the library may use custom memory allocators which valgrind does not recognizes or does some other tricks. There is a lot threads about that:

In general you can't do much about that unless you want to dive into the library code and do something about that. I would suggest to carefully suppress those reported which are coming from the library. The suppression file can be generated as described in the valgrind manual: https://valgrind.org/docs/manual/manual-core.html#manual-core.suppress

... but still, when testing on sample programs, every call to the mmc_run_cl() function bumps up memory by 300MB-400MB and won't release at return

How did you checked that? I haven't seen memory suspiciously growing. I set -n 1000e4 and it made it to run for like 2 minutes where the memory allocated stayed still for all the time at ~0.6% of my RAM size. Note that I didn't use nvidia CUDA but POCL on Intel GPU and CPU and linked with libOpenCL installed from ocl-icd-libopencl1:amd64 package on Ubuntu 18.04. So you may try to give that a go and check if that changes anything.

======== Update ================================

I've re-run it as you described in the comment and after first iteration the memory usage was 0.6% then after 2nd iteration it increased to 0.9% and after that the next iterations didn't increase memory usage. Valgrind also didn't report anything newer besides what I observed earlier. So I would suggest to link with different than nvidia-cuda libOpenCL and retest.

这篇关于OpenCL clCreateContextFromType函数导致内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆