是否有将函数应用于每对列的 R 函数? [英] Is there a R function that applies a function to each pair of columns?

查看:30
本文介绍了是否有将函数应用于每对列的 R 函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常需要对数据帧/矩阵中的每一对列应用一个函数,并在矩阵中返回结果.现在我总是写一个循环来做到这一点.例如,要制作一个包含我写的相关性 p 值的矩阵:

I often need to apply a function to each pair of columns in a dataframe/matrix and return the results in a matrix. Now I always write a loop to do this. For instance, to make a matrix containing the p-values of correlations I write:

df <- data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100))

n <- ncol(df)

foo <- matrix(0,n,n)

for ( i in 1:n)
{
    for (j in i:n)
    {
        foo[i,j] <- cor.test(df[,i],df[,j])$p.value
    }
}

foo[lower.tri(foo)] <- t(foo)[lower.tri(foo)]

foo
          [,1]      [,2]      [,3]
[1,] 0.0000000 0.7215071 0.5651266
[2,] 0.7215071 0.0000000 0.9019746
[3,] 0.5651266 0.9019746 0.0000000

有效,但对于非常大的矩阵来说很慢.我可以在 R 中为此编写一个函数(不必通过假设上述对称结果来将时间减半):

which works, but is quite slow for very large matrices. I can write a function for this in R (not bothering with cutting time in half by assuming a symmetrical outcome as above):

Papply <- function(x,fun)
{
n <- ncol(x)

foo <- matrix(0,n,n)
for ( i in 1:n)
{
    for (j in 1:n)
    {
        foo[i,j] <- fun(x[,i],x[,j])
    }
}
return(foo)
}

或者一个带有 Rcpp 的函数:

Or a function with Rcpp:

library("Rcpp")
library("inline")

src <- 
'
NumericMatrix x(xR);
Function f(fun);
NumericMatrix y(x.ncol(),x.ncol());

for (int i = 0; i < x.ncol(); i++)
{
    for (int j = 0; j < x.ncol(); j++)
    {
        y(i,j) = as<double>(f(wrap(x(_,i)),wrap(x(_,j))));
    }
}
return wrap(y);
'

Papply2 <- cxxfunction(signature(xR="numeric",fun="function"),src,plugin="Rcpp")

但是,即使在包含 100 个变量的非常小的数据集上,两者都非常慢(我认为 Rcpp 函数会更快,但我猜 R 和 C++ 之间的转换总是会产生影响):

But both are quite slow even on a pretty small dataset of 100 variables ( I thought the Rcpp function would be faster, but I guess conversion between R and C++ all the time takes its toll):

> system.time(Papply(matrix(rnorm(100*300),300,100),function(x,y)cor.test(x,y)$p.value))
   user  system elapsed 
   3.73    0.00    3.73 
> system.time(Papply2(matrix(rnorm(100*300),300,100),function(x,y)cor.test(x,y)$p.value))
   user  system elapsed 
   3.71    0.02    3.75 

所以我的问题是:

  1. 由于这些函数的简单性,我假设这已经在 R 中的某个地方.是否有应用或 plyr 函数可以做到这一点?我已经找过了,但没找到.
  2. 如果是这样,速度会更快吗?
  1. Due to the simplicity of these functions I assume this is already somewhere in R. Is there an apply or plyr function that does this? I have looked for it but haven't been able to find it.
  2. If so, is it faster?

推荐答案

它不会更快,但您可以使用 outer 来简化代码.它确实需要一个矢量化函数,所以这里我使用了 Vectorize 来制作函数的矢量化版本,以获得两列之间的相关性.

It wouldn't be faster, but you can use outer to simplify the code. It does require a vectorized function, so here I've used Vectorize to make a vectorized version of the function to get the correlation between two columns.

df <- data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100))
n <- ncol(df)

corpij <- function(i,j,data) {cor.test(data[,i],data[,j])$p.value}
corp <- Vectorize(corpij, vectorize.args=list("i","j"))
outer(1:n,1:n,corp,data=df)

这篇关于是否有将函数应用于每对列的 R 函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆