如何在dplyr中使用group_by()和do()为每个因子水平应用一个函数 [英] How to use group_by() and do() in dplyr to apply a function for each factor level
问题描述
我编写了一个函数(weighted.sd),该函数为我提供了一些加权统计信息(例如均值,SD,标准误和95%置信区间)。我想对每个级别的因子变量(区域)应用此函数,然后对带有误差线的ggplot2图中的每个区域使用加权统计信息(因此,置信区间为95%。
I wrote a function (weighted.sd) that gives me some weighted statistics (like mean, SD, standard error and a 95% confidence interval). I want to apply this function for each level of a factor variable (regions) and then use the weighted statistics for each region in a ggplot2 graph with errorbars (hence the 95% confidence interval.
我也尝试了tapply和for循环。但是我做得不好,而且,我喜欢尽可能多地使用dplyr,因为它易于阅读和理解。
I also tried tapply and a for-loop. But i didn´t get it right. Also, i like to use dplyr as much as i can, because it is easy to read and understand.
这是我的最佳尝试:
#example data
data<-as.data.frame(cbind(rnorm(1:50),as.factor(rnorm(1:50)),rnorm(1:50)))
colnames(data)<-c("index_var","factor_var","weight_var")
weighted.sd <- function(x,weight){
na <- is.na(x) | is.na(weight)
x <- x[!na]
weight <- weight[!na]
sum.w <- sum(weight)
sum.w2 <- sum(weight^2)
mean.w <- sum(x * weight) / sum(weight)
x.var.w<- (sum.w / (sum.w^2 - sum.w2)) * sum(weight * (x - mean.w)^2)
x.sd.w<-sqrt((sum.w / (sum.w^2 - sum.w2)) * sum(weight * (x - mean.w)^2))
SE<- x.sd.w / sqrt(sum(weight))
error <- qnorm(0.975)*x.sd.w/sqrt(sum(weight))
left <- mean.w-error
right <- mean.w+error
return(cbind(mean.w,x.sd.w,SE,error,left,right))
}
test<- data %>%
group_by(factor_var) %>%
do(as.data.frame(weighted.sd(x=index_var,weight=weight_var)))
test
这将导致错误消息(对不起,部分内容是德语,但是您可以使用代码重现它):
This results in an error message (sorry, part of it is german, but you are able to reproduce it with the code):
Error in as.data.frame(weighted.sd(x = index_var, weight = weight_var)) :
Fehler bei der Auswertung des Argumentes 'x' bei der Methodenauswahl
für Funktion 'as.data.frame': Error in weighted.sd(x = index_var, weight = weight_var) :
object 'index_var' not found
推荐答案
在 dplyr $ c中使用
do
时$ c>您需要使用它w ith 。$
以便像这样工作:
When using do
in dplyr
you need to use it with .$
in order to work like this:
test<- data %>%
group_by(factor_var) %>%
do(as.data.frame(weighted.sd(x=.$index_var,weight=.$weight_var)))
test
因此,这将起作用:
> test
Source: local data frame [50 x 7]
Groups: factor_var [50]
factor_var mean.w x.sd.w SE error left right
(dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl)
1 1 1.79711934 NaN NaN NaN NaN NaN
2 2 -0.70698012 NaN NaN NaN NaN NaN
3 3 -0.85125760 NaN NaN NaN NaN NaN
4 4 -0.93903314 NaN NaN NaN NaN NaN
5 5 0.09629631 NaN NaN NaN NaN NaN
6 6 1.02720022 NaN NaN NaN NaN NaN
7 7 1.35090758 NaN NaN NaN NaN NaN
8 8 0.67814249 NaN NaN NaN NaN NaN
9 9 -0.28251464 NaN NaN NaN NaN NaN
10 10 0.38572499 NaN NaN NaN NaN NaN
.. ... ... ... ... ... ... ...
但是,由于负权重( data $ weigh t_var
)产生上述NAN。特别是 sqrt(负数)
部分。
However, you data here is not very good as the negative weights (data$weight_var
) produce the above NANs. In particular the sqrt(negative number)
part.
这篇关于如何在dplyr中使用group_by()和do()为每个因子水平应用一个函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!