通过R在C中操纵矩阵 [英] Manipulating matrices in C through R

查看:129
本文介绍了通过R在C中操纵矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用C编写一些矩阵操作函数,然后将它们传递到R,R将在其中提供矩阵,并获取操作结果.我有一个如下所示的测试函数(请不要介意它的功能,在我的原始函数中,我将需要从每一行中选择一个随机元素,并对它们进行一些计算,然后返回一个由那些随机组成的数组从每一行中选择元素,换句话说,我必须有两个for循环才能遍历矩阵的所有元素.

I want to write some matrix manipulation functions in C, and then pass them onto R, where the matrix will be provided by R, and get the results of the manipulations. I have a test function as shown below (please don't mind what it does, in my original functions, I will need to pick a random element from each row, and do some computation on them, and return an array composed of those randomly chosen elements from each row, in other words, I must have two for loops to go through all the elements of the matrix).

void multMat(double **A, int *r, int *c, double *s)
{
    int i, j;

    for (i = 0; i < *r; ++i)
    {

       for (j = 0; j < *c; ++j) 
       {
           if (j == 5)
               s[i] = A[i][j] * A[i][0];
       }
    }
}

我用R CMD SHLIB multMat.c编译了它,并为我生成了multMat.so.然后,在R端,我有这样的东西:

I compiled this with R CMD SHLIB multMat.c, and it produces multMat.so for me. Then, on the R side, I have something like this:

dyn.load("multMat.so")

multMat <- function(A)
{
  .C("multMat", A=as.double(A), r=as.integer(nrow(A)), c=as.integer(ncol(A)), s=as.double(nrow(A)))
}

然后,我像在R Studio中一样创建了一个测试矩阵,并将其称为:

Afterwards, I created a test matrix like in R Studio and called this function:

A <- matrix(1:100, 10, 10)
multMat(A)

问题是当我运行此功能时,R Studio崩溃.我想C函数的定义方式存在一些问题.有什么想法吗?

The problem is that when I run this function, R Studio crashes. I guess there is some problem with how the C function is defined. Any ideas?

推荐答案

哪里错了

您将A定义为double **

void multMat(double **A, int *r, int *c, double *s)

同时传递double *:

.C("multMat", A=as.double(A), r=as.integer(nrow(A)), c=as.integer(ncol(A)), s=as.double(nrow(A)))

您应该使用一维数组重写C函数.将您的函数定义为:

You should rewrite you C function using 1 dimensional array. Define your function as:

void multMat(double *A, int *r, int *c, double *s)

,然后将A[i][j]替换为A[j * r + i]. (如果我没弄错,r是主要方面.)

and replace A[i][j] by A[j * r + i]. (If I did not get you wrong, r is the leading dimension.)

性能问题:

目前,这是i-j循环,其中j-loop作为内部循环,因此您正在扫描最内部循环中的矩阵行.这是不适合缓存的.您应该互换循环以获得j-i循环.

At the moment, it is i-j loop with j-loop as the inner loop, so that you are scanning a row of the matrix in the innermost loop. This is not cache friendly. You should interchange loop to get j-i loop.

我认为您实际上已经意识到缓存问题.在C语言中,矩阵按行优先顺序存储,因此i-j循环是最优的;但是在R中,矩阵是以列优先级存储的,因此j-i循环是最佳的.

I reckon that you are in fact aware of the cache issue. In C, matrices are stored in row-major-order, so i-j loop is optimal; but in R, matrices are stored in column-major-order, so j-i loop is optimal.

也许矩阵的不同存储方式将导致您现有代码的某些问题.您可能在这里三思而后行.原始的C代码假定它采用按行存储的矩阵,而如果在R中初始化矩阵并将其提供给C,则按列存储.可能需要进行一些更改.如果这导致代码更改过多,则可以尝试将R矩阵的转置传递给C.

Perhaps the different storage style of matrices will cause some problem of your existing code. You may think it twice here. Your original C code assumes it takes a matrix that is stored by row, while if you initialize your matrix in R and feed it to C, it is stored by column. Possibly some changes are needed. If this results in too much change of your code, you may try passing the transpose of your R matrix to C.

您还应该在C代码中使用更多本地/自动变量,而不是使用指针.例如,替换

for (i = 0; i < *r; ++i)

作者

int r_local = *r;
for (i = 0; i < r_local; ++i)

通过 CPU寄存器重用指令减少(无需在每次迭代中取消引用该点),就可以提高性能.

You get performance boots by CPU register reusing and instruction reduction (no need to dereference the point every iteration).

R支持的C/FORTRAN数据类型

因此,没有办法直接从R传递矩阵,然后对矩阵使用随意的约定,这意味着A[i][j]在C端吗?

不,没有. R不支持double **. R中的矩阵以列主格式存储到一个长向量中,属于double *类型.

No, there isn't. R does not support double **. Matrices in R are stored in column-major-format into a long vector, and belong to double * type.

R storage mode  C type        FORTRAN type
logical         int *         INTEGER
integer         int *         INTEGER
double          double *      DOUBLE PRECISION
complex         Rcomplex *    DOUBLE COMPLEX
character       char **       CHARACTER*255
raw unsigned    char *        none

请参见编写R扩展的5.2节.

这篇关于通过R在C中操纵矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆