RCPP和CULA:分段错误 [英] Rcpp and CULA: segmentation fault

查看:68
本文介绍了RCPP和CULA:分段错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从 gputools R 程序包中提取了相关的位,以便通过动态加载链接到的共享库,使用 Rcpp 在我的GPU上运行QR分解。 culatools 。一切都可以在终端上顺利运行,而在我的Mac上, R.app 也可以正常运行。结果与 R qr()函数一致,但问题是退出 R.app 时发生分段错误(错误使用终端时不会发生):

I extracted the relevant bits from the gputools R-package to run a QR decomposition on my GPU using Rcpp by dynamically loading a shared library that links to culatools. Everything runs smoothly in the terminal and R.app on my Mac. The results agree with R's qr() function, but the problem is that a segmentation fault occurs on exiting R.app (the error does not occur when using the terminal):

*** caught segfault ***
address 0x10911b050, cause 'memory not mapped'

我想我将问题缩小为指针 a和 tau在链接到 culatools 的.c文件中:

I think I narrowed down the problem to the pointers 'a' and 'tau' in the .c file that links to culatools:

#include<cula.h>

void gpuQR(const int *m, const int *n, float *a, const int *lda, float *tau)
{
    culaInitialize();   
    culaSgeqrf(m[0], n[0], a, lda[0], tau);
    culaShutdown();
}

我使用以下命令在Mac上编译了.c文件:

I compiled the .c file on my Mac using:

/usr/local/cuda/bin/nvcc -gencode arch=compute_10,code=sm_10 -gencode arch=compute_11,code=sm_11 -gencode arch=compute_12,code=sm_12 -gencode arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20 -c -I. -I/usr/local/cula/include -m64 -Xcompiler -fPIC gpuQR.c -o gpuQR.o
/usr/local/cuda/bin/nvcc -gencode arch=compute_10,code=sm_10 -gencode arch=compute_11,code=sm_11 -gencode arch=compute_12,code=sm_12 -gencode arch=compute_13,code=sm_13 -gencode arch=compute_20,code=sm_20 -shared -m64 -Xlinker -rpath,/usr/local/cula/lib64 -L/usr/local/cula/lib64 -lcula_core -lcula_lapack -lcublas -o gpuQR.so gpuQR.o

我编写了一个.cpp文件,该文件使用 Rcpp 并动态加载共享库gpuQR.so:

I wrote a .cpp file that uses Rcpp and dynamically loads the shared library gpuQR.so:

#include <Rcpp.h>
#include <dlfcn.h>

using namespace Rcpp;
using namespace std;

typedef void (*func)(int*, int*, float*, int*, float*);

RcppExport SEXP gpuQR_Rcpp(SEXP x_, SEXP n_rows_, SEXP n_cols_)
{       
    vector<float> x = as<vector<float> >(x_);
    int n_rows = as<int>(n_rows_);
    int n_cols = as<int>(n_cols_);
    vector<float> scale(n_cols);

    void* lib_handle = dlopen("path/gpuQR.so", RTLD_LAZY);
    if (!lib_handle) 
    { 
        Rcout << dlerror() << endl; 
    } else {
        func gpuQR = (func) dlsym(lib_handle, "gpuQR");  
        gpuQR(&n_rows, &n_cols, &(x[0]), &n_rows, &(scale[0]));
    }

    dlclose(lib_handle);

    for(int ii = 1; ii < n_rows; ii++)
    {
        for(int jj = 0; jj < n_cols; jj++)
        {
            if(ii > jj) { y[ii + jj * n_rows] *= scale[jj]; }
        }
    }

    return wrap(x);
}

我使用 R 编译了.cpp文件:

I compiled the .cpp file in R using:

library(Rcpp)  
PKG_LIBS <- sprintf('%s $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)', Rcpp:::RcppLdFlags()) 
PKG_CPPFLAGS <- sprintf('%s', Rcpp:::RcppCxxFlags())  
Sys.setenv(PKG_LIBS = PKG_LIBS , PKG_CPPFLAGS = PKG_CPPFLAGS) 
R <- file.path(R.home(component = 'bin'), 'R') 
file <- 'path/gpuQR_Rcpp.cpp'
cmd <- sprintf('%s CMD SHLIB %s', R, paste(file, collapse = ' '))
system(cmd)

并运行示例:

dyn.load('path/gpuQR_Rcpp.so')

set.seed(100)
A <- matrix(rnorm(9), 3, 3)
n_row <- nrow(A)
n_col <- ncol(A)

res <- .Call('gpuQR_Rcpp', c(A), n_row, n_col)
matrix(res, n_row, n_col)

           [,1]       [,2]       [,3]
[1,]  0.5250958 -0.8666927  0.8594266
[2,] -0.2504899 -0.3878644 -0.1277837
[3,]  0.1502908  0.4742033 -0.8804248

qr(A)$qr

          [,1]       [,2]       [,3]
[1,]  0.5250957 -0.8666925  0.8594266
[2,] -0.2504899 -0.3878643 -0.1277838
[3,]  0.1502909  0.4742033 -0.8804247

有人知道如何解决分段错误吗?

Does anybody have an idea how to fix the segmentation fault?

推荐答案

实际上没有必要动态加载链接到 culatools 的共享库。我最初考虑过这个问题,但是没有使用 Rcpp 编译得到.cpp文件。无论如何,新的.cpp文件为:

There is actually no need to dynamically load a shared library linking to culatools. I thought about this initially, but I did not get the .cpp file using Rcpp compiled. Anyway, the new .cpp file is:

#include<Rcpp.h>
#include<cula.h>

using namespace Rcpp;
using namespace std;

RcppExport SEXP gpuQR_Rcpp(SEXP x_, SEXP n_rows_, SEXP n_cols_)
{       
    vector<float> x = as<vector<float> >(x_);
    int n_rows = as<int>(n_rows_);
    int n_cols = as<int>(n_cols_);  
    vector<float> scale(n_cols);

    culaInitialize();   
    culaSgeqrf(n_rows, n_cols, &(x[0]), n_rows, &(scale[0]));
    culaShutdown();

    for(int ii = 1; ii < n_rows; ii++)
    {
        for(int jj = 0; jj < n_cols; jj++)
        {
            if(ii > jj) { x[ii + jj * n_rows] *= scale[jj]; }
        }
    }

    return wrap(x);
}

.cpp文件使用以下命令编译:

The .cpp file is compiled using:

library(Rcpp)  
PKG_LIBS <- sprintf('-Wl,-rpath,/usr/local/cula/lib64 -L/usr/local/cula/lib64 -lcula_lapack %s $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)', Rcpp:::RcppLdFlags()) 
PKG_CPPFLAGS <- sprintf('-I/usr/local/cula/include %s', Rcpp:::RcppCxxFlags())  
Sys.setenv(PKG_LIBS = PKG_LIBS , PKG_CPPFLAGS = PKG_CPPFLAGS) 
R <- file.path(R.home(component = 'bin'), 'R') 
file <- 'path/gpuQR_inc.cpp'
cmd <- sprintf('%s CMD SHLIB %s', R, paste(file, collapse = ' '))
system(cmd)

其中我将适当的路径设置为 culatools 。整个过程运行得并不快,但是不再需要编译链接到 culatools 的共享库并动态加载它。

where I set the appropriate path to culatools. The whole thing does not run faster, but there is no need any longer to compile the shared library linking to culatools and dynamically loading it.

我认为这是 gputools R -package扩展 R 的不错选择C ++ 并在GPU上执行线性代数运算。

I think that this is a nice alternative to the gputools R-package to extend R with C++ and perform linear algebra operations on the GPU.

这篇关于RCPP和CULA:分段错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆