适当的推力调用减法 [英] proper thrust call for subtraction

查看:143
本文介绍了适当的推力调用减法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下此处



假设dev_X是一个向量。

  int * X = )malloc(ThreadsPerBlockX * BlocksPerGridX * sizeof(* X)); 


for(int i = 0; i< ThreadsPerBlockX * BlocksPerGridX; i ++)
X [i] = i;

//创建设备向量
thrust :: device_vector< int> dev_X(ThreadsPerBlockX * BlocksPerGridX);

//复制到设备
thrust :: copy(X,X + theThreadsPerBlockX * theBlocksPerGridX,dev_X.begin());

以下是减法:

  thrust :: transform(dev_Kx.begin(),dev_Kx.end(),dev_X.begin(),distX.begin(),thrust :: minus< float>()); 




  dev_Kx  -  dev_X。 


我想使用整个 dev_Kx 向量(因为它使用,因为它从.begin到.end())和整个 dev_X 向量。
$ b

上述代码使用 dev_X.begin()



它将使用整个 dev_X 向量?从开始?
或者我必须使用另一个额外的参数指向 dev_X.end()? (因为在上面的函数调用中我不能只使用这个额外的参数)



另外,例如:



如果我想使用

  thrust :: transform(dev_Kx,dev_Kx + i,dev_X.begin(),distX。 begin(),thrust :: minus< int>()); 

然后 dev_Kx 会从0到i和 dev_X.begin()?它将使用相同的长度? (0到i?)或者它将使用 dev_X ?的长度

解决方案

许多推荐(和标准库)函数将一个范围作为第一个参数,然后假定所有其他迭代器由相同大小的容器支持。范围是指示序列的开始结束的一对迭代器。



例如: p>

  thrust :: copy(
X.begin(),// begin input iterator
X.end ),// end input iterator
dev_X.begin()//开始输出迭代器
);

这会将 X 的所有内容复制到 dev_X 。为什么 dev_X.end()不需要?因为 thrust 需要你,程序员,正确地调整 dev_X ,以便能够包含至少个与输入范围中的元素数量相同的元素。



执行此操作时:



<$ p如果您不满足该保证, $ p> thrust :: transform(
dev_Kx.begin(),// begin input(1)iterator
dev_Kx.end(),// end input
dev_X.begin(),// begin input(2)iterator
distX.begin(),// output iterator
thrust :: minus< float>()
) ;

推荐 dev_Kx.begin() dev_Kx.end()。它具有 dev_Kx.end() - dev_Kx.begin()的显式大小。为什么 dev_X.end() distX.end()不需要?因为它们有 dev_Kx.end() - dev_Kx.begin()隐含大小。例如,如果 dev_Kx 中有10个元素,则 transform 将:




  • 使用 dev_Kx的10个元素

  • 使用<$ c的10个元素$ c> dev_X (必须至少保存 10个元素)

  • 执行减法并将10个结果存储在 distX ,其中必须至少保存 10个元素。



也许看看实施会清除任何疑问。这里有一些伪代码:

  void transform(InputIterator input1_begin,InputIterator input1_end,
InputIterator input2_begin,OutputIterator output,
BinaryFunction op){
while(input1_begin!= input1_end){
* output ++ = op(* input1_begin ++,* input2_begin ++);
}
}

注意如何只需要一个结束迭代器。 p>




在无关的备注上,如下:

  int * X =(int *)malloc(ThreadsPerBlockX * BlocksPerGridX * sizeof(* X)); 
for(int i = 0; i X [i] = i;

可以用更为惯用的,不太容易出错的C ++重写:

  std :: vector< int> X(ThreadsPerBlockX * BlocksPerGridX); 
std :: iota(X.begin(),X.end(),0);


Following from here.

Assuming that dev_X is a vector.

int * X = (int*) malloc( ThreadsPerBlockX * BlocksPerGridX * sizeof(*X) );


for ( int i = 0; i < ThreadsPerBlockX * BlocksPerGridX; i++ )
    X[ i ] = i;

// create device vectors
thrust::device_vector<int> dev_X ( ThreadsPerBlockX * BlocksPerGridX );

//copy to device
thrust::copy( X , X + theThreadsPerBlockX * theBlocksPerGridX , dev_X.begin() );

The following is making a subtraction:

   thrust::transform( dev_Kx.begin(), dev_Kx.end(), dev_X.begin() , distX.begin() , thrust::minus<float>() );

dev_Kx - dev_X.

I want to use the whole dev_Kx vector ( as it is used because it goes from .begin to .end() ) and the whole dev_X vector.

The above code uses dev_X.begin().

Is that meaning that it will use the whole dev_X vector? Starting from the beginning? Or I have to use another extra argument to point to the dev_X.end()? ( because in the above function call I can't just use this extra argument )

Also , for example:

If I want to use

thrust::transform( dev_Kx, dev_Kx + i , dev_X.begin() ,distX.begin() , thrust::minus<int>() ); 

Then dev_Kx would go from 0 to i and the dev_X.begin()? It will use the same length? (0 to i?) Or it will use the length of dev_X?

解决方案

Many thrust (and standard library) functions take a range as a first parameter and then assume all other iterators are backed by containers of the same size. A range is a pair of iterators indicating the beginning and end of a sequence.

For example:

thrust::copy(
    X.begin(),    // begin input iterator
    X.end(),      // end input iterator
    dev_X.begin() // begin output iterator
);

This copies the entire contents of X into dev_X. Why is dev_X.end() not needed? Because thrust requires that you, the programmer, take the care of properly sizing dev_X to be able to contain at least as many elements as there are in the input range. If you don't meet that guarantee, then the behavior is undefined.

When you do this:

thrust::transform(
    dev_Kx.begin(), // begin input (1) iterator
    dev_Kx.end(),   // end input (1) iterator
    dev_X.begin(),  // begin input (2) iterator
    distX.begin(),  // output iterator
    thrust::minus<float>()
);

What thrust sees is an input range from dev_Kx.begin() to dev_Kx.end(). It has an explicit size of dev_Kx.end() - dev_Kx.begin(). Why are dev_X.end() and distX.end() not needed? Because they have an implicit size of dev_Kx.end() - dev_Kx.begin() too. For example, if there are 10 elements in dev_Kx, then transform will:

  • Use the 10 elements of dev_Kx
  • Use 10 elements of dev_X (which must hold at least 10 elements)
  • Perform the substraction and store the 10 results in distX, which must be able to hold at least 10 elements.

Maybe looking at the implementation would clear up any doubts. Here's some pseudo code:

void transform(InputIterator input1_begin, InputIterator input1_end,
               InputIterator input2_begin, OutputIterator output,
               BinaryFunction op) {
    while (input1_begin != input1_end) {
        *output++ = op(*input1_begin++, *input2_begin++);
    }
}

Notice how only one end iterator is needed.


On an unrelated note, the following:

int * X = (int*) malloc( ThreadsPerBlockX * BlocksPerGridX * sizeof(*X) );
for ( int i = 0; i < ThreadsPerBlockX * BlocksPerGridX; i++ )
    X[ i ] = i;

Could be rewritten in more idiomatic, less error-prone C++ to:

std::vector<int> X(ThreadsPerBlockX * BlocksPerGridX);
std::iota(X.begin(), X.end(), 0);

这篇关于适当的推力调用减法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆