使用 CUDA 和 Rcpp 构建微型 R 包 [英] Building a tiny R package with CUDA and Rcpp

查看:27
本文介绍了使用 CUDA 和 Rcpp 构建微型 R 包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个使用 CUDA 和 Rcpp 的 tiny R 包,改编自Rcpp.package.skeleton().我将首先描述在名为固定命名空间"的提交的主分支上发生了什么.如果我忘记了 CUDA(即,如果我删除 src/Makefile,将 src/rcppcuda.cu 更改为 src/rcppcuda.cpp,并注释掉定义和调用内核的代码),则该软件包安装成功.但照原样,编译失败.

I'm working on a tiny R package that uses CUDA and Rcpp, adapted from the output of Rcpp.package.skeleton(). I will first describe what happens on the master branch for the commit entitled "fixed namespace". The package installs successfully if I forget CUDA (i.e., if I remove the src/Makefile, change src/rcppcuda.cu to src/rcppcuda.cpp, and comment out the code that defines and calls kernels). But as is, the compilation fails.

我也想知道如何使用 Makevars 或 Makevars.in 而不是 Makefile 进行编译,一般来说,尽量做到与平台无关.我在 R 扩展手册,但我仍然无法让它工作.

I also would like to know how to compile with a Makevars or Makevars.in instead of a Makefile, and in general, try to make this as platform independent as is realistic. I've read about Makevars in the R extensions manual, but I still haven't been able to make it work.

你们中的一些人可能会建议 rCUDA,但我真正想要的是这里正在改进一个我已经开发了一段时间的大包,我不确定切换是否值得从头开始.

Some of you may suggest rCUDA, but what I'm really after here is improving a big package I've already been developing for some time, and I'm not sure that switching is worth starting again from scratch.

无论如何,当我在 这个(master分支,提交标题为fixed namespace").

Anyway, here's what happens when I do an R CMD build and R CMD INSTALL on this one (master branch, commit entitled "fixed namespace").

* installing to library ‘/home/landau/.R/library’
* installing *source* package ‘rcppcuda’ ...
** libs
** arch - 
/usr/local/cuda/bin/nvcc -c rcppcuda.cu -o rcppcuda.o --shared -Xcompiler "-fPIC" -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -I/apps/R-3.2.0/include -I/usr/local/cuda/include 
rcppcuda.cu:1:18: error: Rcpp.h: No such file or directory
make: *** [rcppcuda.o] Error 1
ERROR: compilation failed for package ‘rcppcuda’
* removing ‘/home/landau/.R/library/rcppcuda’

...这很奇怪,因为我确实包含了 Rcpp.h,并且安装了 Rcpp.

...which is strange, because I do include Rcpp.h, and Rcpp is installed.

$ R

R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

...

> library(Rcpp)
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rcpp_0.11.6
> 

我正在使用 CentOS,

I'm using CentOS,

$ cat /etc/*-release
CentOS release 6.6 (Final)
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
CentOS release 6.6 (Final)
CentOS release 6.6 (Final)

CUDA 版本 6,

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Thu_Mar_13_11:58:58_PDT_2014
Cuda compilation tools, release 6.0, V6.0.1

我可以使用 4 个相同品牌和型号的 GPU.

and I have access to 4 GPUs of the same make and model.

$ /usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery 
/usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 4 CUDA Capable device(s)

Device 0: "Tesla M2070"
  CUDA Driver Version / Runtime Version          6.0 / 6.0
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 5375 MBytes (5636554752 bytes)
  (14) Multiprocessors, ( 32) CUDA Cores/MP:     448 CUDA Cores
  GPU Clock rate:                                1147 MHz (1.15 GHz)
  Memory Clock rate:                             1566 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 786432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           11 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

...

> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU2) : Yes
> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU3) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU1) : No
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU2) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU3) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU2) : No
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU3) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU0) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU1) : No
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU2) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU0) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU2) : No
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU0) : Yes
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU2) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 4, Device0 = Tesla M2070, Device1 = Tesla M2070, Device2 = Tesla M2070, Device3 = Tesla M2070
Result = PASS

<小时>

它为任一分支上固定命名空间"之后的任何提交进行编译,但结合 Rcpp 和 CUDA 仍然存在问题

为了使包编译,事实证明我只需将我的 C++ 和 CUDA 代码分成单独的 *.cpp*.cu 文件.但是,当我在 master 分支上尝试分别编译 cpp 和 cu"提交时,我得到了

To make the package compile, it turns out that I just needed to separate my C++ and CUDA code into separate *.cpp and *.cu files. However, when I try the "compiling cpp and cu separately" commit on the master branch, I get

> library(rcppcuda)
> hello()
An object of class "MyClass"
Slot "x":
 [1]  1  2  3  4  5  6  7  8  9 10

Slot "y":
 [1]  1  2  3  4  5  6  7  8  9 10

Error in .Call("someCPPcode", r) : 
  "someCPPcode" not resolved from current namespace (rcppcuda)
> 

在标题为adding branch withoutCUDA"的提交中,withoutCUDA 分支中的错误消失了.

The error goes away in the withoutCUDA branch in the commit entitled "adding branch withoutCUDA".

> library(rcppcuda)
> hello()
An object of class "MyClass"
Slot "x":
 [1]  1  2  3  4  5  6  7  8  9 10

Slot "y":
 [1]  1  2  3  4  5  6  7  8  9 10

[1] "Object changed."
An object of class "MyClass"
Slot "x":
 [1] 500   2   3   4   5   6   7   8   9  10

Slot "y":
 [1]    1 1000    3    4    5    6    7    8    9   10

> 

master 上的分别编译 cpp 和 cu"提交与 withoutCUDA 上的添加分支 withoutCUDA"提交之间的唯一区别是

The only differences between the "compiling cpp and cu separately" commit on master and the "adding branch withoutCUDA" commit on withoutCUDA are

  • Makefile 和 someCUDAcode.cu 已从 withoutCUDA 中消失.
  • withoutCUDA 中,所有对 someCUDAcode() 的引用都从 someCPPcode.cpp 中消失了.
  • The Makefile and someCUDAcode.cu are gone from withoutCUDA.
  • In withoutCUDA, all references to someCUDAcode() are gone from someCPPcode.cpp.

此外,在同一个 *.cu 文件中使用 CUDA 和 Rcpp 仍然很方便.我真的很想知道如何修复主分支上的固定命名空间"提交.

Also, it would still be convenient be able to use CUDA and Rcpp in the same *.cu file. I would really like to know how to fix the "fixed namespace" commit on the master branch.

推荐答案

检查你的包有很多方面需要改变.

Going through your package there are multiple aspects that need to be changed.

  1. 您不应使用Makefile",而应使用Makevars"文件来提高对多个架构构建的兼容性.
  2. 尽量遵循标准的变量名(例如 CPPC 应该是 CXX),这样可以更好地协同工作.
  3. 不要尝试自己编译共享对象,基本 R 生成文件中有很好的宏可以让这变得更简单(例如 PKG_LIBS、OBJECTS 等)
  4. 对于多个编译器,您将需要使用 OBJECTS 宏.在这里,您将覆盖 R 设置要链接的目标文件的基本尝试(确保将它们全部包含在内).
  5. 您还需要 (AFAIK) 使用 extern "C" 使 CUDA 函数可用.您将在 .cu 文件中以及在 cpp 文件开头声明它时为函数添加前缀.
  1. You shouldn't use a 'Makefile' but a 'Makevars' file instead to improve compatibility for multiple architecture builds.
  2. Try to follow the standard variable names (e.g. CPPC should be CXX), this makes everything play together much better.
  3. Don't try and compile the shared object yourself, there are good macros within the base R makefile that make this much simpler (e.g. PKG_LIBS, OBJECTS, etc.)
  4. With multiple compilers, you will want to use the OBJECTS macro. Here you will override R's base attempt to set the object files to be linked (make sure you include them all).
  5. You also need (AFAIK) to make CUDA functions available with extern "C". You will prefix both the function in the .cu file and when you declare it at the start of your cpp file.

以下 Makevars 对我有用,因此我修改了我的 CUDA_HOME、R_HOME 和 RCPP_INC(为您切换回来).请注意,这里建议使用 configure 文件以使包尽可能可移植.

The following Makevars worked for me whereby I modified my CUDA_HOME, R_HOME, and RCPP_INC (switched back for you). Note, this is where a configure file is recommended to make the package as portable as possible.

CUDA_HOME = /usr/local/cuda
R_HOME = /apps/R-3.2.0
CXX = /usr/bin/g++

# This defines what the shared object libraries will be
PKG_LIBS= -L/usr/local/cuda-7.0/lib64 -Wl,-rpath,/usr/local/cuda-7.0/lib64 -lcudart -d


#########################################

R_INC = /usr/share/R/include
RCPP_INC = $(R_HOME)/library/Rcpp/include

NVCC = $(CUDA_HOME)/bin/nvcc
CUDA_INC = $(CUDA_HOME)/include 
CUDA_LIB = $(CUDA_HOME)/lib64

LIBS = -lcudart -d
NVCC_FLAGS = -Xcompiler "-fPIC" -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -I$(R_INC)

### Define objects
cu_sources := $(wildcard *cu)
cu_sharedlibs := $(patsubst %.cu, %.o,$(cu_sources))

cpp_sources := $(wildcard *.cpp)
cpp_sharedlibs := $(patsubst %.cpp, %.o, $(cpp_sources))

OBJECTS = $(cu_sharedlibs) $(cpp_sharedlibs)

all : rcppcuda.so

rcppcuda.so: $(OBJECTS)

%.o: %.cpp $(cpp_sources)
        $(CXX) $< -c -fPIC -I$(R_INC) -I$(RCPP_INC)

%.o: %.cu $(cu_sources)
        $(NVCC) $(NVCC_FLAGS) -I$(CUDA_INC) $< -c

一个跟进点(正如你所说,这是一个学习练习):

A follow-up point (as you say this is a learning exercise):

A.您没有使用 Rcpp 中使其成为如此出色的软件包的部分之一,即属性".以下是您的 cpp 文件的外观:

A. You aren't using one of the parts of Rcpp that make it such a wonderful package, namely 'attributes'. Here is how your cpp file should look:

#include <Rcpp.h>
using namespace Rcpp;

extern "C"
void someCUDAcode();

//[[Rcpp::export]]
SEXP someCPPcode(SEXP r) {
  S4 c(r);
  double *x = REAL(c.slot("x"));
  int *y = INTEGER(c.slot("y"));
  x[0] = 500.0;
  y[1] = 1000;
  someCUDAcode();
  return R_NilValue;
}

这会自动生成对应的RcppExports.cppRcppExports.R文件,你不再需要自己的.Call函数了.您只需调用该函数.现在 .Call('someCPPcode', r) 变成 someCPPcode(r) :)

This will automatically generate the corresponding RcppExports.cpp and RcppExports.R files and you no longer need a .Call function yourself. You just call the function. Now .Call('someCPPcode', r) becomes someCPPcode(r) :)

为了完整起见,这里是更新的 someCUDAcode.cu 文件:

For completeness, here is the updated someCUDAcode.cu file:

__global__ void mykernel(int a){
  int id = threadIdx.x;
  int b = a;
  b++;
  id++;
}


extern "C"
void someCUDAcode() {
  mykernel<<<1, 1>>>(1);
}

关于配置文件(使用 autoconf),欢迎您查看我的 gpuRcuda 包使用 Rcpp、CUDA 和 ViennaCL(一个 C++ GPU 计算库).

With respect to a configure file (using autoconf), you are welcome to check out my gpuRcuda package using Rcpp, CUDA, and ViennaCL (a C++ GPU computing library).

这篇关于使用 CUDA 和 Rcpp 构建微型 R 包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆