适当的推力调用减法 [英] proper thrust call for subtraction
问题描述
以下此处。
假设dev_X是一个向量。
int * X = )malloc(ThreadsPerBlockX * BlocksPerGridX * sizeof(* X));
for(int i = 0; i< ThreadsPerBlockX * BlocksPerGridX; i ++)
X [i] = i;
//创建设备向量
thrust :: device_vector< int> dev_X(ThreadsPerBlockX * BlocksPerGridX);
//复制到设备
thrust :: copy(X,X + theThreadsPerBlockX * theBlocksPerGridX,dev_X.begin());
以下是减法:
thrust :: transform(dev_Kx.begin(),dev_Kx.end(),dev_X.begin(),distX.begin(),thrust :: minus< float>());
dev_Kx - dev_X。
我想使用整个 dev_Kx
向量(因为它使用,因为它从.begin到.end())和整个 dev_X
向量。
$ b
上述代码使用 dev_X.begin()
。
它将使用整个 dev_X
向量?从开始?
或者我必须使用另一个额外的参数指向 dev_X.end()
? (因为在上面的函数调用中我不能只使用这个额外的参数)
另外,例如:
如果我想使用
thrust :: transform(dev_Kx,dev_Kx + i,dev_X.begin(),distX。 begin(),thrust :: minus< int>());
然后 dev_Kx
会从0到i和 dev_X.begin()
?它将使用相同的长度? (0到i?)或者它将使用 dev_X
?的长度
许多推荐
(和标准库)函数将一个范围作为第一个参数,然后假定所有其他迭代器由相同大小的容器支持。范围是指示序列的开始和结束的一对迭代器。
例如: p>
thrust :: copy(
X.begin(),// begin input iterator
X.end ),// end input iterator
dev_X.begin()//开始输出迭代器
);
这会将 X
的所有内容复制到 dev_X
。为什么 dev_X.end()
不需要?因为 thrust
需要你,程序员,正确地调整 dev_X
,以便能够包含至少个与输入范围中的元素数量相同的元素。
执行此操作时:
<$ p如果您不满足该保证, $ p>
thrust :: transform(
dev_Kx.begin(),// begin input(1)iterator
dev_Kx.end(),// end input
dev_X.begin(),// begin input(2)iterator
distX.begin(),// output iterator
thrust :: minus< float>()
) ;
推荐
dev_Kx.begin()
到 dev_Kx.end()
。它具有 dev_Kx.end() - dev_Kx.begin()
的显式大小。为什么 dev_X.end()
和 distX.end()
不需要?因为它们有 dev_Kx.end() - dev_Kx.begin()
的隐含大小。例如,如果 dev_Kx
中有10个元素,则 transform
将:
- 使用
dev_Kx的10个元素
- 使用<$ c的10个元素$ c> dev_X (必须至少保存 10个元素)
- 执行减法并将10个结果存储在
distX
,其中必须至少保存 10个元素。
也许看看实施会清除任何疑问。这里有一些伪代码:
void transform(InputIterator input1_begin,InputIterator input1_end,
InputIterator input2_begin,OutputIterator output,
BinaryFunction op){
while(input1_begin!= input1_end){
* output ++ = op(* input1_begin ++,* input2_begin ++);
}
}
注意如何只需要一个结束迭代器。 p>
在无关的备注上,如下:
int * X =(int *)malloc(ThreadsPerBlockX * BlocksPerGridX * sizeof(* X));
for(int i = 0; i X [i] = i;
可以用更为惯用的,不太容易出错的C ++重写:
std :: vector< int> X(ThreadsPerBlockX * BlocksPerGridX);
std :: iota(X.begin(),X.end(),0);
Following from here.
Assuming that dev_X is a vector.
int * X = (int*) malloc( ThreadsPerBlockX * BlocksPerGridX * sizeof(*X) );
for ( int i = 0; i < ThreadsPerBlockX * BlocksPerGridX; i++ )
X[ i ] = i;
// create device vectors
thrust::device_vector<int> dev_X ( ThreadsPerBlockX * BlocksPerGridX );
//copy to device
thrust::copy( X , X + theThreadsPerBlockX * theBlocksPerGridX , dev_X.begin() );
The following is making a subtraction:
thrust::transform( dev_Kx.begin(), dev_Kx.end(), dev_X.begin() , distX.begin() , thrust::minus<float>() );
dev_Kx - dev_X.
I want to use the whole dev_Kx
vector ( as it is used because it goes from .begin to .end() ) and the whole dev_X
vector.
The above code uses dev_X.begin()
.
Is that meaning that it will use the whole dev_X
vector? Starting from the beginning?
Or I have to use another extra argument to point to the dev_X.end()
? ( because in the above function call I can't just use this extra argument )
Also , for example:
If I want to use
thrust::transform( dev_Kx, dev_Kx + i , dev_X.begin() ,distX.begin() , thrust::minus<int>() );
Then dev_Kx
would go from 0 to i and the dev_X.begin()
? It will use the same length? (0 to i?) Or it will use the length of dev_X
?
Many thrust
(and standard library) functions take a range as a first parameter and then assume all other iterators are backed by containers of the same size. A range is a pair of iterators indicating the beginning and end of a sequence.
For example:
thrust::copy(
X.begin(), // begin input iterator
X.end(), // end input iterator
dev_X.begin() // begin output iterator
);
This copies the entire contents of X
into dev_X
. Why is dev_X.end()
not needed? Because thrust
requires that you, the programmer, take the care of properly sizing dev_X
to be able to contain at least as many elements as there are in the input range. If you don't meet that guarantee, then the behavior is undefined.
When you do this:
thrust::transform(
dev_Kx.begin(), // begin input (1) iterator
dev_Kx.end(), // end input (1) iterator
dev_X.begin(), // begin input (2) iterator
distX.begin(), // output iterator
thrust::minus<float>()
);
What thrust
sees is an input range from dev_Kx.begin()
to dev_Kx.end()
. It has an explicit size of dev_Kx.end() - dev_Kx.begin()
. Why are dev_X.end()
and distX.end()
not needed? Because they have an implicit size of dev_Kx.end() - dev_Kx.begin()
too. For example, if there are 10 elements in dev_Kx
, then transform
will:
- Use the 10 elements of
dev_Kx
- Use 10 elements of
dev_X
(which must hold at least 10 elements) - Perform the substraction and store the 10 results in
distX
, which must be able to hold at least 10 elements.
Maybe looking at the implementation would clear up any doubts. Here's some pseudo code:
void transform(InputIterator input1_begin, InputIterator input1_end,
InputIterator input2_begin, OutputIterator output,
BinaryFunction op) {
while (input1_begin != input1_end) {
*output++ = op(*input1_begin++, *input2_begin++);
}
}
Notice how only one end iterator is needed.
On an unrelated note, the following:
int * X = (int*) malloc( ThreadsPerBlockX * BlocksPerGridX * sizeof(*X) );
for ( int i = 0; i < ThreadsPerBlockX * BlocksPerGridX; i++ )
X[ i ] = i;
Could be rewritten in more idiomatic, less error-prone C++ to:
std::vector<int> X(ThreadsPerBlockX * BlocksPerGridX);
std::iota(X.begin(), X.end(), 0);
这篇关于适当的推力调用减法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!