如何在ddply中将变量传递给自定义函数? [英] How do I pass variables to a custom function in ddply?

查看:94
本文介绍了如何在ddply中将变量传递给自定义函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下数据:

d = data.frame(
    experiment = as.factor(c("foo", "foo", "foo", "bar", "bar")),
    si = runif(5),
    ti = runif(5)
)

我想针对每个experiment因子水平对siti进行相关性测试.所以我想我会跑步:

I would like to perform a correlation test for si and ti, for each experiment factor level. So I thought I'd run:

ddply(d, .(experiment), cor.test)

但是如何将siti的值传递给cor.test调用?我试过了:

But how do I pass the values of si and ti to the cor.test call? I tried this:

> ddply(d, .(experiment), cor.test, x = si, y = ti)
Error in .fun(piece, ...) : object 'si' not found
> ddply(d, .(experiment), cor.test, si, ti)
Error in match.arg(alternative) : 
  'arg' must be NULL or a character vector

有什么明显的我想念的东西吗? plyr文档对我来说没有任何示例.我看到的大多数命令都只将summarize用作函数调用,但是不能像从上面看到的那样执行我以前在summarize中所做的平常操作.

Is there anything obvious I'm missing? The plyr documentation does not include any example for me. Most commands I see only involve summarize as the function call, but doing the usual things I was used to doing from summarize don't work, as can be seen above.

推荐答案

ddply通过选择的变量(此处为experiment)拆分数据框,然后将该函数传递给数据框的结果子集.在您的情况下,函数cor.test不接受数据框作为输入,因此您需要一个转换层:

ddply splits your data frame by the variables you select (experiment here) and then passes the function the resulting subsets of the data frame. In your case your function cor.test doesn't accept a data frame as an input, so you need a translation layer:

d <- data.frame(
  experiment = as.factor(c("foo", "foo", "foo", "bar", "bar", "bar")),
  si = runif(6),
  ti = runif(6)
)
ddply(d, .(experiment), function(d.sub) cor.test(d.sub$si, d.sub$ti)$statistic)
#   experiment         t
# 1        bar 0.1517205
# 2        foo 0.3387682

此外,您的输出必须类似于矢量或数据框,这就是为什么我在上面选择了$statistic的原因,但是如果需要,您可以添加多个变量.

Also, your output has to be something like a vector or a data frame, which is why I just chose $statistic above, but you could have added multiple variables if you wanted.

请注意,我必须在输入数据框中添加一个值,因为它cor.test不能在2个值上运行("bar"就是这种情况).如果您需要更全面的统计信息,可以尝试:

Side note, I had to add a value to the input data frame as it cor.test won't run on 2 values (was the case for "bar"). If you want more comprehensive stats, you can try:

ddply(d, .(experiment), function(d.sub) {
  as.data.frame(cor.test(d.sub$si, d.sub$ti)[c("statistic", "parameter", "p.value", "estimate")])
} )
#   experiment statistic parameter   p.value  estimate
# 1        bar 0.1517205         1 0.9041428 0.1500039
# 2        foo 0.3387682         1 0.7920584 0.3208567 

请注意,由于我们现在返回的不仅仅是矢量,因此我们需要将其强制转换为data.frame.如果要包含更复杂的值(例如,置信区间,它是两个值的结果),则必须先对其进行简化.

Note that since we're now returning something more complex than just a vector, we need to coerce it to a data.frame. If you want to include more complex values (e.g. the confidence interval, which is a two value result), you would have to simplify them first.

这篇关于如何在ddply中将变量传递给自定义函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆