来自OpenCL内核的处理字符串 [英] Process strings form OpenCL kernel

查看:80
本文介绍了来自OpenCL内核的处理字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一些类似的字符串

std :: string第一,第二,第三;...

std::string first, second, third; ...

我的计划是将其地址收集到char *数组中:

My plan was to collect their addresses into a char* array:

char *addresses = {&first[0], &second[0], &third[0]} ...

并将char **地址传递给OpenCL内核.

and pass the char **addresses to the OpenCL kernel.

有几个问题或疑问:

主要问题是我无法传递指针数组.

The main issue is that I cannot pass array of pointers.

有什么好方法可以使用内核代码中的许多字符串,而无需复制它们,而是将它们留在共享内存中?

Is there any good way to use many-many strings from the kernel code without copying them but leave them in the shared memory?

我在Windows上使用NVIDIA.因此,我只能使用OpenCL 1.2版本.

I'm using NVIDIA on Windows. So, I can use only OpenCL 1.2 version.

我无法连接字符串,因为它们来自不同的结构...

I cannot concatenate the string because those are from different structure...

根据第一个答案,如果我有这个(示例):

According to the first answer, if I have this (example):

char *p;

cl_mem cmHostString = clCreateBuffer(myDev.getcxGPUContext(), CL_MEM_ALLOC_HOST_PTR, BUFFER_SIZE, NULL, &oclErr);

oclErr = clEnqueueWriteBuffer(myDev.getCqCommandQueue(), cmHostString, CL_TRUE, 0, BUFFER_SIZE, p, 0, NULL, NULL);

我是否需要将char数组 的每个元素从主机内存复制到主机的其他部分 (并且新地址从主机中隐藏了)??我不合逻辑.为什么不能使用相同的地址?我可以直接从GPU设备访问主机内存并使用它.

Do I need copy the each element of my char array from host memory to other part of the host memory (and the new address is hidden from the host)?? It is not logical me. Why cannot I use the same address? I could directly access the host memory from the GPU device and use it.

推荐答案

有什么好方法可以使用内核代码中的许多字符串,而无需复制它们,而是将它们留在共享内存中?

Is there any good way to use many-many strings from the kernel code without copying them but leave them in the shared memory?

不在OpenCL1.2中.自从OpenCL 2.0以来,NVidia尚不支持共享虚拟内存的概念.您将需要切换到支持OpenCL 2.0的GPU或对于OpenCL 1.2,将字符串复制到连续的字符数组中,然后将它们(复制)传递给内核.

Not in OpenCL1.2. Shared Virtual Memory concept is available since OpenCL 2.0 which isn't supported by NVidia as yet. You will need to either switch to GPU that supports OpenCL 2.0 or for OpenCL 1.2 copy your strings into continuous array of characters and pass them (copy) to the kernel.

编辑:响应您的编辑-您可以使用:

EDIT: Responding to your edit - you can use:

  • CL_MEM_ALLOC_HOST_PTR 标志创建所需大小的空缓冲区,然后使用 clEnqueueMapBuffer 映射该缓冲区,并使用从映射返回的指针进行填充.之后,使用 clEnqueueUnmapMemObject 取消对缓冲区的映射.
  • CL_MEM_USE_HOST_PTR 标志创建所需大小的缓冲区,并将指针传递给该数组,以指向您的字符数组.
  • CL_MEM_ALLOC_HOST_PTR flag to create empty buffer of required size and then map that buffer using clEnqueueMapBuffer and fill it using the pointer returned from mapping. After that unmap the buffer using clEnqueueUnmapMemObject.
  • CL_MEM_USE_HOST_PTR flag to create buffer of required size and pass there pointer to your array of characters.

从使用 CL_MEM_USE_HOST_PTR 标志创建的经验缓冲区来看,通常会稍快一些,我认为是否真正复制数据取决于实现.但是要使用它,您需要先在主机上准备好字符数组.

From my experience buffer created using CL_MEM_USE_HOST_PTR flag is usually slightly faster, I think whether data is really copied or not under the hood depends on the implementation. But to use that you need to have your array of characters first prepared on the host.

您基本上需要进行基准测试,看看有什么更快的方法.同样不要太专注于数据复制,与运行内核所需的时间(当然取决于内核中的内容)相比,它们通常是很小的数字(以GB/秒为单位的传输).

You basically need to benchmark and see what is faster. Also don't concentrate too much on data copying, these are usually tiny numbers (transfers in GB/sec) in compare to how long it takes to run the kernel (depends of course what's in the kernel).

这篇关于来自OpenCL内核的处理字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆