设备上调用的printf()输出不完整 [英] Incomplete output from printf() called on device

查看:1483
本文介绍了设备上调用的printf()输出不完整的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了测试printf()在设备上的调用,我写了一个简单的程序,它将中等大小的数组复制到设备,并将设备数组的值打印到屏幕。虽然阵列被正确地复制到设备,printf()函数不能正常工作,这丢失了前几百个数字。代码中的数组大小是4096.这是一个错误还是我没有正确使用这个函数?感谢adavnce。

For the purpose of testing printf() call on device, I wrote a simple program which copies an array of moderate size to device and print the value of device array to screen. Although the array is correctly copied to device, the printf() function does not work correctly, which lost the first several hundred numbers. The array size in the code is 4096. Is this a bug or I'm not using this function properly? Thanks in adavnce.

编辑:我的gpu是GeForce GTX 550i,具有计算能力2.1

My gpu is GeForce GTX 550i, with compute capability 2.1

#include<stdio.h>
#include<stdlib.h>
#define N 4096

__global__ void Printcell(float *d_Array , int n){
    int k = 0;

    printf("\n=========== data of d_Array on device==============\n");
    for( k = 0; k < n; k++ ){
        printf("%f  ", d_Array[k]);
        if((k+1)%6 == 0) printf("\n");
    }
    printf("\n\nTotally %d elements has been printed", k);
}

int main(){

    int i =0;

    float Array[N] = {0}, rArray[N] = {0};
    float *d_Array;
    for(i=0;i<N;i++)
        Array[i] = i;


    cudaMalloc((void**)&d_Array, N*sizeof(float));
    cudaMemcpy(d_Array, Array, N*sizeof(float), cudaMemcpyHostToDevice);
    cudaDeviceSynchronize();
    Printcell<<<1,1>>>(d_Array, N);    //Print the device array by a kernel
    cudaDeviceSynchronize();

    /* Copy the device array back to host to see if it was correctly copied */   
    cudaMemcpy(rArray, d_Array, N*sizeof(float), cudaMemcpyDeviceToHost);

    printf("\n\n");

    for(i=0;i<N;i++){
        printf("%f  ", rArray[i]);
        if((i+1)%6 == 0) printf("\n");
    }
}


推荐答案

printf从设备有有限队列。它适用于小型调试样式输出,而不是大规模输出。

printf from the device has a limited queue. It's intended for small scale debug-style output, not large scale output.

指向程序员指南


在内核启动之前,printf()的输出缓冲区设置为固定大小(请参阅关联的主机端API)。它是循环的,如果在内核执行期间产生的输出比可以容纳在缓冲器中多,则旧的输出将被覆盖。

The output buffer for printf() is set to a fixed size before kernel launch (see Associated Host-Side API). It is circular and if more output is produced during kernel execution than can fit in the buffer, older output is overwritten.

内核printf输出覆盖缓冲区,因此第一个打印的元素在缓冲区转储到标准I / O队列之前丢失(覆盖)。

Your in-kernel printf output overran the buffer, and so the first printed elements were lost (overwritten) before the buffer was dumped into the standard I/O queue.

这篇关于设备上调用的printf()输出不完整的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆