将CUDA printf重定向到C ++流 [英] Redirecting CUDA printf to a C++ stream

查看:252
本文介绍了将CUDA printf重定向到C ++流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题陈述



我正在使用一个使用记录器进行调试的大型项目。因为我喜欢跟踪一些CUDA内核中发生的事情,所以我试图找到一种方法来重定向



可能的解决方案



我管理使用以下代码执行此操作:

  #include< cuda.h> 
#include< stdio.h>
#include< unistd.h> // dup

#include< iostream>
#include< sstream> // stringstream
#include< fstream> // ofstream

char * output_file =printf_redirect.log;

__global__ void printf_redirect(int * src,int * res)
{
res [threadIdx.x] = threadIdx.x;
printf(%i:Hello World!\\\
,res [threadIdx.x]);
}

int main()
{
using namespace std;

const uint N = 2;

//注意:虚拟数组实际上并未使用,但这应该防止
//编译器丢弃内核中的printf。

int * d_A,* d_B,* h_A,* h_B;
size_t size = N * sizeof(int);
cudaMalloc(& d_A,size);
cudaMalloc(& d_B,size);
h_A =(int *)malloc(size);
h_B =(int *)malloc(size);
cudaMemcpy(d_A,h_A,size,cudaMemcpyHostToDevice);

std :: cout<< std :: cout-start< std :: endl;
printf(stdout - start\\\
);

///常规打印
//打印到常规stdout
std :: cout< 输出到stdout:< std :: endl;
printf_redirect<<<< 1,1>>> (d_A,d_B);
cudaDeviceSynchronize();

///重定向到STRINGSTREAM
std :: stringstream ss;
//将std :: cout重定向到stringstream
std :: streambuf * backup_cout = std :: cout.rdbuf();
std :: cout.rdbuf(ss.rdbuf());
//将stdout重定向到缓冲区
char buf [1024] =;
int backup_stdout = dup(fileno(stdout));
freopen(/ dev / null,w,stdout);
setbuf(stdout,buf);

std :: cout<< 重定向输出:< std :: endl;
printf_redirect<<<< 1,N>>> (d_A,d_B);
cudaDeviceSynchronize();

//将CUDA缓冲区添加到字符串流
ss<< f

//将stringstream写入文件
std :: ofstream outFile;
outFile.open(output_file);
outFile<< ss.str();
outFile.close();

///重定向重定向
//重定向回初始stdout
fflush(stdout);
setbuf(stdout,NULL);
fclose(stdout);
FILE * fp = fdopen(backup_stdout,w);
fclose(stdout);
* stdout = * fp;
//重定向回初始std :: cout
std :: cout.rdbuf(backup_cout);

std :: cout<< std :: cout-end< std :: endl;
printf(stdout - end \\\
);

cudaMemcpy(h_B,d_B,size,cudaMemcpyDeviceToHost);

cudaFree(d_A);
cudaFree(d_B);
free(h_A);
free(h_B);
}



我使用以下问题来实现这一点:





运行程序,我们进入控制台:

  std :: cout  -  start 
stdout - start
输出到stdout:
0:Hello World!
std :: cout - end
stdout - end

c $ c> printf_redirect.log :

重定向输出:
0:Hello World!
1:你好世界!



问题



实现这个? (例如隐藏的CUDA选项或整洁的C / C ++技巧)



请注意,最终的解决方案将在实用程序类中结束, / p>

解决方案

设备端 printf()导致线程的隐式序列化打印,因此您可能不想在生产代码中使用它。



设备端 printf()使内核复制消息到预先分配的环形缓冲区。在隐式或释放设备同步(cudaDeviceSynchronize())时,CUDA将缓冲区中的任何内容转储到 stdout ,然后清除它。



您可以简单地实现自己的设备 printf()。它的性能可能不会比内置的更糟。唯一的缺点是你必须将环形缓冲区传递给内核,并在内核返回后添加一个调用来处理它。



实现方式如下:




  • 创建一个包含固定数字的空格的缓冲区 printf()


  • 创建使用 atomicInc()

  • 将环形缓冲区传递到当前打印位置,并将格式化字符串和参数复制到当前位置。


  • 创建一个主机函数,它接受环形缓冲区,运行格式化字符串和



Problem statement

I am working on a large project which uses a logger for debugging. Since I like to keep a trace of what's happening in some CUDA kernels, I tried to find a way to redirect the printf of my CUDA kernels to a stringstream (or any stream), which can then be forwarded to the logger.

Possible solution

I managed to do it by using the following code:

#include <cuda.h>
#include <stdio.h>
#include <unistd.h> // dup

#include <iostream>
#include <sstream> // stringstream
#include <fstream> // ofstream

char* output_file = "printf_redirect.log";

__global__ void printf_redirect(int* src, int* res)
{
    res[threadIdx.x] = threadIdx.x;
    printf("  %i: Hello World!\n", res[threadIdx.x]);
}

int main()
{
    using namespace std;

    const uint N = 2;

    // Note: dummy arrays are not actually used, but this should prevent the
    //       compiler from discarding the printf in the kernel.

    int *d_A, *d_B, *h_A, *h_B;
    size_t size = N * sizeof (int);
    cudaMalloc (&d_A, size);
    cudaMalloc (&d_B, size);
    h_A = (int*) malloc (size);
    h_B = (int*) malloc (size);
    cudaMemcpy (d_A, h_A, size, cudaMemcpyHostToDevice);

    std::cout << "std::cout - start" << std::endl;
    printf ("stdout - start\n");

    /// REGULAR PRINT
    // Print to regular stdout
    std::cout << "Output to stdout:" << std::endl;
    printf_redirect<<<1,1>>> (d_A, d_B);
    cudaDeviceSynchronize ();

    /// REDIRECTION TO STRINGSTREAM
    std::stringstream ss;
    // Redirect std::cout to a stringstream
    std::streambuf* backup_cout = std::cout.rdbuf ();
    std::cout.rdbuf (ss.rdbuf ());
    // Redirect stdout to a buffer
    char buf[1024] = "";
    int backup_stdout = dup (fileno (stdout));
    freopen ("/dev/null", "w", stdout);
    setbuf (stdout, buf);

    std::cout << "Redirected output:" << std::endl;
    printf_redirect<<<1,N>>> (d_A, d_B);
    cudaDeviceSynchronize ();

    // Add CUDA buffer to a stringstream
    ss << buf;

    // Write stringstream to file
    std::ofstream outFile;
    outFile.open (output_file);
    outFile << ss.str ();
    outFile.close ();

    /// RESET REDIRECTION
    // Redirect back to initial stdout
    fflush (stdout);
    setbuf (stdout, NULL);
    fclose (stdout);
    FILE *fp = fdopen (backup_stdout, "w");
    fclose (stdout);
    *stdout = *fp;
    // Redirect back to initial std::cout
    std::cout.rdbuf (backup_cout);

    std::cout << "std::cout - end" << std::endl;
    printf ("stdout - end\n");

    cudaMemcpy(h_B, d_B, size, cudaMemcpyDeviceToHost);

    cudaFree(d_A);
    cudaFree(d_B);
    free (h_A);
    free (h_B);
}

I used the following questions to achieve this:

Running the program, we get in the console:

std::cout - start
stdout - start
Output to stdout:
  0: Hello World!
std::cout - end
stdout - end

And in printf_redirect.log:

Redirected output:
  0: Hello World!
  1: Hello World!

Question

Is there any easier way to achieve this? (e.g. hidden CUDA option or neat C/C++ trick)

Note that the final solution will end up in a utility class to make this less verbose in the actual code.

解决方案

Device side printf() causes implicit serialization of the threads that are printing so you probably wouldn't want to use it in production code.

Device side printf() works by having the kernel copy messages to a preallocated ring buffer. Upon implicit or explit device synchronization (cudaDeviceSynchronize()), CUDA dumps any contents in the buffer to stdout and then clears it.

You could simply implement your own device printf(). Its performance would probably not be any worse than the built in one. The only disadvantage is that you would have to pass the ring buffer to the kernel and add a call to process it after the kernel returns.

Implementation would be something like this:

  • Create a buffer with room for a fixed numer of printf() formatting strings and associated 32-bit or 64-bit parameters.

  • Create a device function that uses atomicInc() to keep track of the current print location and takes a formatting string and parameters and copies them into the current location.

  • Pass the ring buffer to the kernel, which then passes it to the device function together with the print parameters.

  • Create a host function that takes the ring buffer, runs the formatting strings and parameters through host side sprintf() and passes the results to the logger.

这篇关于将CUDA printf重定向到C ++流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆