在线编译单CUDA函数 [英] Online compilation of single CUDA function

查看:195
本文介绍了在线编译单CUDA函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个函数在我的程序中调用float valueAt(float3 v)。它应该返回给定点的函数的值。该函数由用户指定。我现在有一个解释器为此功能,但其他人建议我编译该函数在线,所以它在机器代码,并且更快。

I have a function in my program called float valueAt(float3 v). It's supposed to return the value of a function at the given point. The function is user-specified. I have an interpreter for this function at the moment, but others recommended I compile the function online so it's in machine code and is faster.

我该怎么做?我相信我知道如何加载功能,当我有PTX生成,但我不知道如何生成PTX。

How do I do this? I believe I know how to load the function when I have PTX generated, but I have no idea how to generate the PTX.

推荐答案

我想了一会儿这个问题,虽然我不认为这是一个伟大的解决方案,它似乎工作,所以我想我会分享。

I've thought about this problem for a while, and while I don't think this is a "great" solution, it does seem to work so I thought I would share it.

基本思想是使用linux来生成进程来编译,然后运行编译的代码。我认为这是一个没有脑子,但从我把这些片断,我会在这里发布的说明,以防它对其他人有用。

The basic idea is to use linux to spawn processes to compile and then run the compiled code. I think this is pretty much a no-brainer, but since I put together the pieces, I'll post instructions here in case it's useful for somebody else.

问题语句的问题是能够获取一个包含用户定义函数的文件,让我们假设它是单个变量 f(x)的函数,即 y = f(x),并且x和y可以用 float 表示。

The problem statement in the question is to be able to take a file that contains a user-defined function, let's assume it is a function of a single variable f(x), i.e. y = f(x), and that x and y can be represented by float quantities.

用户将编辑一个名为 fx.txt 的文件,其中包含所需的函数。此文件必须符合C语法规则。

The user would edit a file called fx.txt that contains the desired function. This file must conform to C syntax rules.

fx.txt:

y=1/x

此文件将包含在 __ device__ 的函数:

This file then gets included in the __device__ function that will be holding it:

user_testfunc.cuh:

user_testfunc.cuh:

__device__ float fx(float x){
  float y;
#include "fx.txt"
;
  return y;
}

包含在通过包装器调用的内核中。

which gets included in the kernel that is called via a wrapper.

cudalib.cu:

cudalib.cu:

#include <math.h>
#include "cudalib.h"
#include "user_testfunc.cuh"

__global__ void my_kernel(float x, float *y){

  *y = fx(x);
}

float cudalib_compute_fx(float x){
  float *d, *h_d;
  h_d = (float *)malloc(sizeof(float));
  cudaMalloc(&d, sizeof(float));
  my_kernel<<<1,1>>>(x, d);
  cudaMemcpy(h_d, d, sizeof(float), cudaMemcpyDeviceToHost);
  return *h_d;
  }

cudalib.h:

cudalib.h:

float cudalib_compute_fx(float x);

上述文件内置到共享库中:

The above files get built into a shared library:

nvcc -arch=sm_20 -Xcompiler -fPIC -shared cudalib.cu -o libmycudalib.so

我们需要一个主应用程序来使用此共享库。

We need a main application to use this shared library.

t452.cu:

#include <stdio.h>
#include <stdlib.h>
#include "cudalib.h"

int main(int argc, char* argv[]){

  if (argc == 1){
    //  recompile lib, and spawn new process
    int retval = system("nvcc -arch=sm_20 -Xcompiler -fPIC -shared cudalib.cu -o libmycudalib.so");
    char scmd[128];
    sprintf(scmd, "%s skip", argv[0]);
    retval = system(scmd);}
  else { // compute f(x) at x = 2.0
    printf("Result is: %f\n", cudalib_compute_fx(2.0));
    }
  return 0;
}

这样编译:

nvcc -arch=sm_20 -o t452 t452.cu -L. -lmycudalib



此时,主应用程序( t452 )可以被执行并且它将产生在这种情况下为0.5的f(2.0)的结果:

At this point, the main application (t452) can be executed and it will produce the result of f(2.0) which is 0.5 in this case:

$ LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./t452
Result is: 0.500000

修改 fx.txt 文件:

$ vi fx.txt
$ cat fx.txt
y = 5/x

应用程序和新的功能行为:

And just re-run the app, and the new functional behavior is used:

$ LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./t452
Result is: 2.500000

此方法利用了重新编译/替换共享库新的linux进程会拿起新的共享库。还要注意,我省略了几种错误检查的清晰度。至少我会检查CUDA错误,我也可能会在重新编译之前删除共享对象(.so)库,然后在编译之后测试它是否存在,做一个编译成功的基本测试。

This method takes advantage of the fact that upon recompilation/replacement of a shared library, a new linux process will pick up the new shared library. Also note that I've omitted several kinds of error checking for clarity. At a minimum I would check CUDA errors, and I would also probably delete the shared object (.so) library before recompiling it, and then test for its existence after compilation, to do a basic test that the compilation proceeded successfully.

此方法完全使用运行时API实现此目标,因此用户必须在其计算机上安装CUDA工具包,并进行适当设置,以便 nvcc 在PATH中可用。使用带有PTX代码的驱动程序API将使这个过程变得更干净(不需要用户机器上的工具包),但是AFAIK没有办法从CUDA C生成PTX,而不使用 nvcc 或用户创建的工具链构建在nvidia llvm编译器工具上。在将来,在标准CUDA C工具链中,或者甚至由驱动程序可能有更多的集成方法。

This method entirely uses the runtime API to achieve this goal, so as a result the user would have to have the CUDA toolkit installed on their machine and appropriately set up so that nvcc is available in the PATH. Using the driver API with PTX code would make this process much cleaner (and not require the toolkit on the user's machine), but AFAIK there is no way to generate PTX from CUDA C without using nvcc or a user-created toolchain built on the nvidia llvm compiler tools. In the future, there may be a more "integrated" approach available in the "standard" CUDA C toolchain, or perhaps even by the driver.

可以使用单独的编译和设备代码链接来安排类似的方法,以便需要向用户公开的唯一源代码在 user_testfunc.cu (和 fx.txt )。

A similar approach can be arranged using separate compilation and linking of device code, such that the only source code that needs to be exposed to the user is in user_testfunc.cu (and fx.txt).

现在是CUDA 运行时编译工具,应该用于代替上述。

There is now a CUDA runtime compilation facility, which should be used in place of the above.

这篇关于在线编译单CUDA函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆