使用 cudaMemcpy2D 复制一维跨步数据的方法 [英] Recipe to copy 1D strided data with cudaMemcpy2D

查看:122
本文介绍了使用 cudaMemcpy2D 复制一维跨步数据的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果一个设备内存有两个连续范围,则可以使用 cudaMemcpy 将内存从一个复制到另一个.

If one has two continuous ranges of device memory it is possible to copy memory from from one to the other using cudaMemcpy.

   double* source = ...
   double* dest = ...
   cudaMemcpy(dest, source, N, cudaMemcpyDeviceToDevice);

现在假设我想将 source 复制到 dest,但分别是每 2 或 3 个元素.即 dest[0] = source[0], dest[3] = source[2], dest[6] = source[4], ....当然,一个普通的 cudaMemcpy 无法做到这一点.

Now suppose that I want to copy source into dest, but every 2 or 3 elements respectively. That is dest[0] = source[0], dest[3] = source[2], dest[6] = source[4], .... Of course a single plain cudaMemcpy cannot do this.

直观地,cudaMemcpy2D 应该能够完成这项工作,因为跨步元素可以被视为更大数组中的一列".但是 cudaMemcpy2D 它有许多输入参数在这种情况下难以解释,例如 pitch.

Intuitively, cudaMemcpy2D should be able to do the job, because "strided elements can be see as a column in a larger array". But cudaMemcpy2D it has many input parameters that are obscure to interpret in this context, such as pitch.

例如,我经理使用 cudaMemcpy2D 来重现两个步幅均为 1 的情况.

For example, I manager to use cudaMemcpy2D to reproduce the case where both strides are 1.

    cudaMemcpy2D(dest, 1, source, 1, 1, n*sizeof(T), cudaMemcpyDeviceToHost);

但我无法弄清楚一般情况,dest_stridesource_stride 与 1 的差异.

But I cannot figure out the general case, with dest_stride and source_stride difference from 1.

有没有办法用 cudaMemcpy2D 将跨步数据复制到跨步数据?我必须以什么顺序放置关于布局的已知信息?,即两个步幅和sizeof(T).

Is there a way to copy strided data to stride data with cudaMemcpy2D? In which order do I have to put the known information about the layout?, namely, in terms of the two strides and sizeof(T).

    cudaMemcpy2D(dest, ??, source, ???, ????, ????, cudaMemcpyDeviceToHost);

推荐答案

用于这种跨步复制的通用函数大致如下所示:

A generic function for such a strided copy could look roughly like this:

void cudaMemcpyStrided(
        void *dst, int dstStride, 
        void *src, int srcStride, 
        int numElements, int elementSize, int kind) {
    int srcPitchInBytes = srcStride * elementSize;
    int dstPitchInBytes = dstStride * elementSize;
    int width = 1 * elementSize;
    int height = numElements;
    cudaMemcpy2D(
        dst, dstPitchInBytes, 
        src, srcPitchInBytes, 
        width, height,
        kind);
}

对于你的例子,它可以被称为

And for your example, it could be called as

cudaMemcpyStrided(dest, 3, source, 2, 3, sizeof(double), cudaMemcpyDeviceToDevice);

<小时>

大致",因为我只是从我测试过的(基于 Java/JCuda 的)代码中即时翻译了它:


"Roughly", because I just translated it on the fly from the (Java/JCuda based) code that I tested it with:

import static jcuda.runtime.JCuda.cudaMemcpy2D;

import java.util.Arrays;
import java.util.Locale;

import jcuda.Pointer;
import jcuda.Sizeof;
import jcuda.runtime.cudaMemcpyKind;

public class JCudaStridedMemcopy {
    public static void main(String[] args) {

        int dstLength = 9;
        int srcLength = 6;
        int dstStride = 3;
        int srcStride = 2;
        int numElements = 3;
        runExample(dstLength, dstStride, srcLength, srcStride, numElements);

        dstLength = 9;
        srcLength = 12;
        dstStride = 3;
        srcStride = 4;
        numElements = 3;
        runExample(dstLength, dstStride, srcLength, srcStride, numElements);

        dstLength = 18;
        srcLength = 12;
        dstStride = 3;
        srcStride = 2;
        numElements = 6;
        runExample(dstLength, dstStride, srcLength, srcStride, numElements);

    }

    private static void runExample(int dstLength, int dstStride, int srcLength, int srcStride, int numElements) {
        double dst[] = new double[dstLength];
        double src[] = new double[srcLength];
        for (int i = 0; i < src.length; i++) {
            src[i] = i;
        }

        cudaMemcpyStrided(dst, dstStride, src, srcStride, numElements);

        System.out.println("Copy " + numElements + " elements");
        System.out.println("  to   array with length " + dstLength + ", with a stride of " + dstStride);
        System.out.println("  from array with length " + srcLength + ", with a stride of " + srcStride);

        System.out.println("");

        System.out.println("Destination:");
        System.out.println(toString2D(dst, dstStride));
        System.out.println("Flat: " + Arrays.toString(dst));

        System.out.println("");

        System.out.println("Source:");
        System.out.println(toString2D(src, srcStride));
        System.out.println("Flat: " + Arrays.toString(src));

        System.out.println("");
        System.out.println("Done");
        System.out.println("");

    }

    private static void cudaMemcpyStrided(double dst[], int dstStride, double src[], int srcStride, int numElements) {
        long srcPitchInBytes = srcStride * Sizeof.DOUBLE;
        long dstPitchInBytes = dstStride * Sizeof.DOUBLE;
        long width = 1 * Sizeof.DOUBLE;
        long height = numElements;
        cudaMemcpy2D(
                Pointer.to(dst), dstPitchInBytes, 
                Pointer.to(src), srcPitchInBytes, 
                width, height,
                cudaMemcpyKind.cudaMemcpyHostToHost);
    }

    public static String toString2D(double[] a, long columns) {
        String format = "%4.1f ";
        ;
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < a.length; i++) {
            if (i > 0 && i % columns == 0) {
                sb.append("\n");
            }
            sb.append(String.format(Locale.ENGLISH, format, a[i]));
        }
        return sb.toString();
    }
}

为了了解函数的作用,基于示例/测试用例,这里是输出:

To give an idea of what the function does, based on the examples/test cases, here is the output:

Copy 3 elements
  to   array with length 9, with a stride of 3
  from array with length 6, with a stride of 2

Destination:
 0.0  0.0  0.0 
 2.0  0.0  0.0 
 4.0  0.0  0.0 
Flat: [0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0]

Source:
 0.0  1.0 
 2.0  3.0 
 4.0  5.0 
Flat: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0]

Done

Copy 3 elements
  to   array with length 9, with a stride of 3
  from array with length 12, with a stride of 4

Destination:
 0.0  0.0  0.0 
 4.0  0.0  0.0 
 8.0  0.0  0.0 
Flat: [0.0, 0.0, 0.0, 4.0, 0.0, 0.0, 8.0, 0.0, 0.0]

Source:
 0.0  1.0  2.0  3.0 
 4.0  5.0  6.0  7.0 
 8.0  9.0 10.0 11.0 
Flat: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0]

Done

Copy 6 elements
  to   array with length 18, with a stride of 3
  from array with length 12, with a stride of 2

Destination:
 0.0  0.0  0.0 
 2.0  0.0  0.0 
 4.0  0.0  0.0 
 6.0  0.0  0.0 
 8.0  0.0  0.0 
10.0  0.0  0.0 
Flat: [0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 4.0, 0.0, 0.0, 6.0, 0.0, 0.0, 8.0, 0.0, 0.0, 10.0, 0.0, 0.0]

Source:
 0.0  1.0 
 2.0  3.0 
 4.0  5.0 
 6.0  7.0 
 8.0  9.0 
10.0 11.0 
Flat: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0]

Done

这篇关于使用 cudaMemcpy2D 复制一维跨步数据的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆