在设备上调用 printf() 的不完整输出 [英] Incomplete output from printf() called on device

查看:40
本文介绍了在设备上调用 printf() 的不完整输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了测试设备上的 printf() 调用,我编写了一个简单的程序,它将一个中等大小的数组复制到设备并将设备数组的值打印到屏幕上.尽管数组已正确复制到设备,但 printf() 函数无法正常工作,从而丢失了前几百个数字.代码中的数组大小为 4096.这是一个错误还是我没有正确使用这个函数?非常感谢.

For the purpose of testing printf() call on device, I wrote a simple program which copies an array of moderate size to device and print the value of device array to screen. Although the array is correctly copied to device, the printf() function does not work correctly, which lost the first several hundred numbers. The array size in the code is 4096. Is this a bug or I'm not using this function properly? Thanks in adavnce.

我的 gpu 是 GeForce GTX 550i,计算能力为 2.1

My gpu is GeForce GTX 550i, with compute capability 2.1

我的代码:

#include<stdio.h>
#include<stdlib.h>
#define N 4096

__global__ void Printcell(float *d_Array , int n){
    int k = 0;

    printf("
=========== data of d_Array on device==============
");
    for( k = 0; k < n; k++ ){
        printf("%f  ", d_Array[k]);
        if((k+1)%6 == 0) printf("
");
    }
    printf("

Totally %d elements has been printed", k);
}

int main(){

    int i =0;

    float Array[N] = {0}, rArray[N] = {0};
    float *d_Array;
    for(i=0;i<N;i++)
        Array[i] = i;


    cudaMalloc((void**)&d_Array, N*sizeof(float));
    cudaMemcpy(d_Array, Array, N*sizeof(float), cudaMemcpyHostToDevice);
    cudaDeviceSynchronize();
    Printcell<<<1,1>>>(d_Array, N);    //Print the device array by a kernel
    cudaDeviceSynchronize();

    /* Copy the device array back to host to see if it was correctly copied */   
    cudaMemcpy(rArray, d_Array, N*sizeof(float), cudaMemcpyDeviceToHost);

    printf("

");

    for(i=0;i<N;i++){
        printf("%f  ", rArray[i]);
        if((i+1)%6 == 0) printf("
");
    }
}

推荐答案

来自设备的 printf 队列有限.它适用于小规模调试式输出,而不是大规模输出.

printf from the device has a limited queue. It's intended for small scale debug-style output, not large scale output.

参考程序员指南:

printf() 的输出缓冲区在内核启动之前设置为固定大小(请参阅关联的主机端 API).它是循环的,如果在内核执行期间产生的输出超出缓冲区的容量,则会覆盖较旧的输出.

The output buffer for printf() is set to a fixed size before kernel launch (see Associated Host-Side API). It is circular and if more output is produced during kernel execution than can fit in the buffer, older output is overwritten.

您的内核中 printf 输出超出了缓冲区,因此在缓冲区转储到标准 I/O 队列之前,第一个打印的元素丢失(覆盖).

Your in-kernel printf output overran the buffer, and so the first printed elements were lost (overwritten) before the buffer was dumped into the standard I/O queue.

链接的文档表明缓冲区大小也可以增加.

The linked documentation indicates that the buffer size can be increased, also.

这篇关于在设备上调用 printf() 的不完整输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆