使用CUDA生成素数时出现问题 [英] Trouble generating prime numbers with CUDA

查看:168
本文介绍了使用CUDA生成素数时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是盯着cuda,经过了矢量和教程这里我以为我会尝试从头开始,以真正让我的腿在我下面。

I am just getting stared with cuda, and after going over the vector sum tutorials here I thought I would try something from scratch to really get my legs under me.

这说我不知道

我的代码的简单英语描述如下:

首先是一个counterClass,它有成员num和count。通过设置count = 0当计数等于num时,这个计数器类将跟踪余数,除以num,当我们遍历整数。

First there is a counterClass that has members num and count. By setting count = 0 when ever count equals num this counter class will keep track of the remainder when dividing by num as we iterate up through the integers.

我有两个功能,我想并行运行。第一个被调用的计数将增加我的所有计数器(并行),第二个将检查任何计数器读取0(并行)如果计数器读取0,该num均匀n均值意味着n不是素数。

I have 2 functions that I want to run in parallel. The first called count which will increment all my counters (in parallel), and the second which will check if any of the counters read 0 (in parallel) If a counter reads 0 that num divides n evenly meaning that n isn't prime.

虽然我希望我的代码只打印素数,打印所有数字...

While I would like my code to only print prime numbers, it prints all the numbers...

代码如下:

#include <stdio.h>
#include <stdlib.h>

typedef struct{
    int num;
    int count;
} counterClass;

counterClass new_counterClass(counterClass aCounter, int by, int count){
    aCounter.num = by;
    aCounter.count = count%by;
    return aCounter;
}

__global__ void count(counterClass *Counters){
    int idx = threadIdx.x+blockDim.x*blockIdx.x;
    Counters[idx].count+=1;
    if(Counters[idx].count == Counters[idx].num){
        Counters[idx].count = 0;
    }
    __syncthreads();
}

__global__ void check(counterClass *Counters, bool *result){
    int idx = threadIdx.x+blockDim.x*blockIdx.x;
    if (Counters[idx].count == 0){
        *result = false;
    }
    __syncthreads();
}

int main(){
    int tPrimes = 5;    // Total Primes to Find
    int nPrimes = 1;    // Number of Primes Found
    bool  *d_result, h_result=true;
    counterClass *h_counters =(counterClass *)malloc(tPrimes*sizeof(counterClass));
    h_counters[0]=new_counterClass(h_counters[0], 2 , 0);
    counterClass *d_counters;
    int n = 2;
    cudaMalloc((void **)&d_counters, tPrimes*sizeof(counterClass));
    cudaMalloc((void **)&d_result, sizeof(bool));
    cudaMemcpy(d_counters, h_counters, tPrimes*sizeof(counterClass), cudaMemcpyHostToDevice);
    while(nPrimes<tPrimes){
        h_result=true;
        cudaMemcpy(d_result, &h_result, sizeof(bool), cudaMemcpyHostToDevice);
        n+=1;
        count<<<1,nPrimes>>>(d_counters);
        check<<<1,nPrimes>>>(d_counters,d_result);
        cudaMemcpy(&h_result, d_result, sizeof(bool), cudaMemcpyDeviceToHost);
        if(h_result){
            printf("%d\n", n);
            cudaMemcpy(h_counters, d_counters, tPrimes*sizeof(counterClass), cudaMemcpyDeviceToHost);
            h_counters[nPrimes]=new_counterClass(h_counters[nPrimes], n , 0);
            nPrimes += 1;
            cudaMemcpy(d_counters, h_counters, tPrimes*sizeof(counterClass), cudaMemcpyHostToDevice);
        }
    }
}

a href =http://stackoverflow.com/questions/30962293/cuda-sieve-of-eratosthenes-division-into-parts> CUDA - 将Eratosthenes筛分为部分和作为问题发布的好例子寻求改进代码的人, CUDA Primes Generation
& CUDA素数生成器的低性能但是阅读这些内容没有帮助我找出我的代码中出了什么问题。

There are some similar questions CUDA - Sieve of Eratosthenes division into parts and good examples posted as questions by people seeking to improve their code , CUDA Primes Generation & Low performance in CUDA prime number generator But reading through these hasn't helped me figure out what is going wrong in my code!

任何建议如何更有效地调试与CUDA工作,如果你能指出什么我做错了(因为我知道这不是电脑的错误)你会永远尊重我的尊敬。

Any advice on how to more effectively debug while working with CUDA would be appreciated and if you can point out what I am doing wrong (because I know it's not the computers fault) you will have my respect forever.

edit:

显然这个问题只发生在我身上,所以也许是我运行我的代码的方式...

apparently this issue is only happening for me so perhaps it's the way I'm running my code...

$ nvcc parraPrimes.cu -o primes
$ ./primes
3
4
5
6

另外使用cuda-memCheck推荐:

additionally using cuda-memCheck as recommended:

$ cuda-memcheck ./primes
========= CUDA-MEMCHECK
3
4
5
6
========= ERROR SUMMARY: 0 errors

dmesg | grep NVRM 的输出如下:

[    3.480443] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  304.131  Sun Nov  8 21:43:33 PST 2015

Nvidia -smi未安装在我的系统上。

Nvidia-smi is not installed on my system.

推荐答案

安装nvidia-cuda-toolkit不会安装cuda。

Apt installing the nvidia-cuda-toolkit does not install cuda.

您可以安装cuda表单 nvidia's网站。 (*使用.deb)

You can install cuda form nvidia's website. (*Use the .deb)

这篇关于使用CUDA生成素数时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆