CUDA imes生成 [英] CUDA Primes Generation

查看：231 发布时间：2016/10/23 12:33:40 c++ c cuda gpu primes

本文介绍了CUDA imes生成的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的CUDA程序停止工作（它不打印任何东西），因为数据大小增加超过260k。

My CUDA program stops working(it prints nothing) as data size increases over 260k.

有人可以告诉我为什么会发生这种情况吗？这是我第一个CUDA程序。如果我想要更大的素数，如何在CUDA上使用大于long long int的数据类型？

Can someone tell me why this is happening? This is my first CUDA program. And if I want bigger primes, how to use datatype larger than long long int on CUDA?

图形卡是GT425M。

The graphics card is GT425M.

#include<stdio.h>
#include<stdlib.h>
#include<cuda.h>
#define SIZE 250000
#define BLOCK_NUM 96
#define THREAD_NUM 1024
int data[SIZE];
__global__ static void sieve(int *num,clock_t* time){
    const int tid = threadIdx.x;
    const int bid = blockIdx.x;
    int tmp=bid*THREAD_NUM+tid;
    if(tid==0) time[bid] = clock();
    while(tmp<SIZE){
        int i=1;
        while(((2*tmp+3)*i+tmp+1)<SIZE){
            num[(2*tmp+3)*i+tmp+1] = 0;
            i++;
        }
        tmp+=BLOCK_NUM*THREAD_NUM;
    }
    if(tid==0) time[bid+BLOCK_NUM] = clock();
}
void GenerateNumbers(int *number,int size){
    for(int i=0;i<size;i++)
        number[i] = 2*i+1;
    number[0] = 2;
}
int main(){
    GenerateNumbers(data,SIZE);
    int *gpudata;
    clock_t* time;
    int cpudata[SIZE];
    cudaMalloc((void**)&gpudata,sizeof(int)*SIZE);
    cudaMalloc((void**)&time,sizeof(clock_t)*BLOCK_NUM*2);
    cudaMemcpy(gpudata,data,sizeof(int)*SIZE,cudaMemcpyHostToDevice);
    sieve<<<BLOCK_NUM,THREAD_NUM,0>>>(gpudata,time);
    clock_t time_used[BLOCK_NUM * 2];
    cudaMemcpy(&cpudata,gpudata,sizeof(int)*SIZE,cudaMemcpyDeviceToHost);
    cudaMemcpy(&time_used,time,sizeof(clock_t)*BLOCK_NUM*2,cudaMemcpyDeviceToHost);
    cudaFree(gpudata);
    for(int i=0;i<SIZE;i++)
        if(cpudata[i]!=0)
            printf("%d\t",cpudata[i]);
    clock_t min_start,max_end;
    min_start = time_used[0];
    max_end = time_used[BLOCK_NUM];
    for(int i=1;i<BLOCK_NUM;i++) {
        if(min_start>time_used[i])
            min_start=time_used[i];
        if(max_end<time_used[i+BLOCK_NUM])
            max_end=time_used[i+BLOCK_NUM];
    }
    printf("\nTime Cost: %d\n",max_end-min_start);
}

推荐答案

提供64位。没有比64位宽的内置非向量整数类型。但是，您可以轻松地构建您自己的128位整数类型。例如：

(unsigned) long long int provides 64-bits. There is no built-in non-vector integer type that is wider than 64 bits. However, you could easily build your own 128-bit integer type. For example:

typedef struct {
  unsigned long long int lo;
  unsigned long long int hi;
} my_uint128;

my_uint128 add_uint128 (my_uint128 a, my_uint128 b)
{
  my_uint128 res;
  res.lo = a.lo + b.lo;
  res.hi = a.hi + b.hi + (res.lo < a.lo);
  return res;
}

如果需要更高性能的解决方案，请考虑将128位整数uint4，并使用内联PTX来更有效地处理四个32位块之间的进位。 source

If a higher performance solution is desired, consider mapping a 128-bit integer to a uint4 and using inline PTX for more efficient handling of the carries between the four 32-bit chunks. source

这篇关于CUDA imes生成的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

CUDA imes生成 [英] CUDA Primes Generation

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

CUDA imes生成 [英] CUDA Primes Generation

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭