是否存在将函数应用于每对列的R函数? [英] Is there a R function that applies a function to each pair of columns?
问题描述
我经常需要将一个函数应用于数据框/矩阵中的每一对列,并将结果返回到矩阵中.现在,我总是编写一个循环来执行此操作.例如,要编写一个包含相关性的p值的矩阵,我会写:
I often need to apply a function to each pair of columns in a dataframe/matrix and return the results in a matrix. Now I always write a loop to do this. For instance, to make a matrix containing the p-values of correlations I write:
df <- data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100))
n <- ncol(df)
foo <- matrix(0,n,n)
for ( i in 1:n)
{
for (j in i:n)
{
foo[i,j] <- cor.test(df[,i],df[,j])$p.value
}
}
foo[lower.tri(foo)] <- t(foo)[lower.tri(foo)]
foo
[,1] [,2] [,3]
[1,] 0.0000000 0.7215071 0.5651266
[2,] 0.7215071 0.0000000 0.9019746
[3,] 0.5651266 0.9019746 0.0000000
可以,但是对于非常大的矩阵来说速度很慢.我可以在R中为此编写一个函数(不要因为假设上述对称结果而浪费一半的切割时间):
which works, but is quite slow for very large matrices. I can write a function for this in R (not bothering with cutting time in half by assuming a symmetrical outcome as above):
Papply <- function(x,fun)
{
n <- ncol(x)
foo <- matrix(0,n,n)
for ( i in 1:n)
{
for (j in 1:n)
{
foo[i,j] <- fun(x[,i],x[,j])
}
}
return(foo)
}
或带有Rcpp的函数:
Or a function with Rcpp:
library("Rcpp")
library("inline")
src <-
'
NumericMatrix x(xR);
Function f(fun);
NumericMatrix y(x.ncol(),x.ncol());
for (int i = 0; i < x.ncol(); i++)
{
for (int j = 0; j < x.ncol(); j++)
{
y(i,j) = as<double>(f(wrap(x(_,i)),wrap(x(_,j))));
}
}
return wrap(y);
'
Papply2 <- cxxfunction(signature(xR="numeric",fun="function"),src,plugin="Rcpp")
但是即使在一个很小的包含100个变量的数据集上,两者都相当慢(我以为Rcpp函数会更快,但是我猜R和C ++之间的转换一直都在付出代价):
But both are quite slow even on a pretty small dataset of 100 variables ( I thought the Rcpp function would be faster, but I guess conversion between R and C++ all the time takes its toll):
> system.time(Papply(matrix(rnorm(100*300),300,100),function(x,y)cor.test(x,y)$p.value))
user system elapsed
3.73 0.00 3.73
> system.time(Papply2(matrix(rnorm(100*300),300,100),function(x,y)cor.test(x,y)$p.value))
user system elapsed
3.71 0.02 3.75
所以我的问题是:
- 由于这些函数的简单性,我假设它已经在R中.是否有apply或
plyr
函数可以做到这一点?我一直在寻找它,但是找不到它. - 如果这样,会更快吗?
- Due to the simplicity of these functions I assume this is already somewhere in R. Is there an apply or
plyr
function that does this? I have looked for it but haven't been able to find it. - If so, is it faster?
推荐答案
它不会更快,但是您可以使用outer
来简化代码.它确实需要向量化函数,因此在这里,我使用Vectorize
制作了该函数的向量化版本,以获取两列之间的相关性.
It wouldn't be faster, but you can use outer
to simplify the code. It does require a vectorized function, so here I've used Vectorize
to make a vectorized version of the function to get the correlation between two columns.
df <- data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100))
n <- ncol(df)
corpij <- function(i,j,data) {cor.test(data[,i],data[,j])$p.value}
corp <- Vectorize(corpij, vectorize.args=list("i","j"))
outer(1:n,1:n,corp,data=df)
这篇关于是否存在将函数应用于每对列的R函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!