调用CudaFree当CPU的多线程应用CUDA不是异步 [英] Multi-Threaded CPU CUDA application not asynchronous when calling CudaFree

查看:1323
本文介绍了调用CudaFree当CPU的多线程应用CUDA不是异步的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由多个CPU线程因此每个CPU线程创建在我的GPU同cudaContext单独cudaStream的应用。我有一个特斯拉K20C。我使用Windows 7 64位和CUDA 5.5。

I have an application that is made up of multiple CPU threads whereby each CPU Thread creates a separate cudaStream in the same cudaContext on my GPU. I have a Tesla K20c. I'm using Windows 7 64 bit and Cuda 5.5.

下面是我的code:

#include "gpuCode.cuh"

__global__ void kernelAddConstant1(int *g_a, const int b)
{
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    g_a[idx] += b;
    for (int i = 0; i < 4000000.0; i++)
    {
        if (i%2 == 0)
        {
            g_a[idx] += 5;
        }
        else
        {
            g_a[idx] -= 5;
        }
    }
}


// a predicate that checks whether each array elemen is set to its index plus b
int correctResult(int *data, const int n, const int b)
{
    for (int i = 0; i < n; i++)
    {
        if (data[i] != i + b)
        {
            return 0;
        }
    }
    return 11;
}

int gpuDo()
{
    cudaSetDevice(0);
    cudaStream_t stream;
    cudaStreamCreate( &stream );

    int *a;
    int *d_a;

    unsigned int n;
    unsigned int nbytes;

    int b;

    n = 2 * 8192/16;
    nbytes = n * sizeof(int);
    b = 7;      // value by which the array is incremented

    cudaHostAlloc( (void**)&a, nbytes, cudaHostAllocDefault ) ;
    cudaMalloc((void **)&d_a, nbytes);

    for (unsigned int i = 0; i < n; i++)
        a[i] = i;

    unsigned int nbytes_per_kernel = nbytes;
    dim3 gpu_threads(128);  // 128 threads per block
    dim3 gpu_blocks(n / gpu_threads.x);

    cudaMemsetAsync(d_a, 0, nbytes_per_kernel, stream);

    cudaMemcpyAsync(d_a, a, nbytes_per_kernel, cudaMemcpyHostToDevice, stream);


    kernelAddConstant1<<<gpu_blocks, gpu_threads, 0, stream>>>(d_a, b);

    cudaMemcpyAsync(a, d_a, nbytes_per_kernel, cudaMemcpyDeviceToHost, stream);
    cudaStreamSynchronize ( stream ) ;
    cudaStreamDestroy(stream);

    //cudaFree(d_a);

    int bResult = correctResult(a, n, b);

    //if (a)
        //cudaFreeHost(a); // free CPU memory

    return bResult;
}

void gpuEnd()
{
    cudaDeviceReset();
}

当我离开cudaFree和cudaFreeHost注释掉我实现以下目的:

When I leave cudaFree and cudaFreeHost commented out I achieve the following result:


这是除了我,因为我没有使用cud​​aFree和cudaFreeHost有内存泄漏完美。当我使用cud​​aFree和cudaFreeHost我得到以下结果:

This is perfect except that I have a memory leak because I'm not using cudaFree and cudaFreeHost. When I do use cudaFree and cudaFreeHost I get the following result:


这是不好的。当使用cud​​aFree某些流等待别人先完成有的流异步工作。我假设这是因为cudaFree不是异步这是罚款,但是这并不能解释为什么有时可以作为前三个内核调用,但不能在其他时间?如果cudaFree被称为但是GPU已经忙着做别的事情,才有可能具备CPU继续运算,并让cudaFree自动出现它获得的第一次机会?是否有另一种方式来处理这个问题?感谢您的帮助,您可以给!

This is bad. When using cudaFree some streams wait for others to finish first and some streams work asynchronously. I'm assuming this is because cudaFree is not asynchronous which is fine but that doesn't explain why it sometimes works as in the first three kernels called but not at other times? If cudaFree is called but the GPU is already busy doing something else is it possible to have the CPU continue computing and let cudaFree occur automatically the first chance it gets? Is there another way to approach this issue? Thanks for any help you can give!

推荐答案

cudaFree 不是异步的。 Niether是 cudaMalloc

Yes, cudaFree is not asynchronous. Niether is cudaMalloc

贵公司的所有分配了前面的你的时机至关重要code之前,并在年底做免费手术。

Do all of your allocations up front before your timing critical code, and do the free operations at the end.

这应该是你的情况特别容易,因为分配的大小是一样的,每次

This should be particularly easy in your case, since the size of the allocation is the same each time.

同样的话适用于流创建。我不会理会创建和动态摧毁他们。创建然而,许多你想要的,并重新使用它们,直到你就大功告成了。

Same comments apply to stream creation. I wouldn't bother creating and destroying them on the fly. Create however many you want, and reuse them until you're done.

这篇关于调用CudaFree当CPU的多线程应用CUDA不是异步的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆