如何在dplyr中使用group_by（）和do（）为每个因子水平应用一个函数 [英] How to use group_by() and do() in dplyr to apply a function for each factor level

查看：74 发布时间：2020/10/26 4:33:39 r function dplyr

本文介绍了如何在dplyr中使用group_by（）和do（）为每个因子水平应用一个函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我编写了一个函数（weighted.sd），该函数为我提供了一些加权统计信息（例如均值，SD，标准误和95％置信区间）。我想对每个级别的因子变量（区域）应用此函数，然后对带有误差线的ggplot2图中的每个区域使用加权统计信息（因此，置信区间为95％。

I wrote a function (weighted.sd) that gives me some weighted statistics (like mean, SD, standard error and a 95% confidence interval). I want to apply this function for each level of a factor variable (regions) and then use the weighted statistics for each region in a ggplot2 graph with errorbars (hence the 95% confidence interval.

我也尝试了tapply和for循环。但是我做得不好，而且，我喜欢尽可能多地使用dplyr，因为它易于阅读和理解。

I also tried tapply and a for-loop. But i didn´t get it right. Also, i like to use dplyr as much as i can, because it is easy to read and understand.

这是我的最佳尝试：

#example data 
data<-as.data.frame(cbind(rnorm(1:50),as.factor(rnorm(1:50)),rnorm(1:50)))
colnames(data)<-c("index_var","factor_var","weight_var") 

weighted.sd <- function(x,weight){
  na <- is.na(x) | is.na(weight)
  x <- x[!na]
  weight <- weight[!na]  
  sum.w <- sum(weight)
  sum.w2 <- sum(weight^2)
  mean.w <- sum(x * weight) / sum(weight)
  x.var.w<-    (sum.w / (sum.w^2 - sum.w2)) * sum(weight * (x - mean.w)^2)
  x.sd.w<-sqrt((sum.w / (sum.w^2 - sum.w2)) * sum(weight * (x - mean.w)^2))
  SE<- x.sd.w / sqrt(sum(weight))
  error <- qnorm(0.975)*x.sd.w/sqrt(sum(weight))
  left <- mean.w-error
  right <- mean.w+error  
  return(cbind(mean.w,x.sd.w,SE,error,left,right))
}

test<- data %>% 
  group_by(factor_var) %>% 
  do(as.data.frame(weighted.sd(x=index_var,weight=weight_var)))
test

这将导致错误消息（对不起，部分内容是德语，但是您可以使用代码重现它）：

This results in an error message (sorry, part of it is german, but you are able to reproduce it with the code):

   Error in as.data.frame(weighted.sd(x = index_var, weight = weight_var)) : 
      Fehler bei der Auswertung des Argumentes 'x' bei der Methodenauswahl
    für Funktion 'as.data.frame': Error in weighted.sd(x = index_var, weight = weight_var) : 
      object 'index_var' not found

推荐答案

在 dplyr do 时$ c>您需要使用它w ith 。$ 以便像这样工作：

When using do in dplyr you need to use it with .$ in order to work like this:

test<- data %>% 
  group_by(factor_var) %>% 
  do(as.data.frame(weighted.sd(x=.$index_var,weight=.$weight_var)))
test

因此，这将起作用：

> test
Source: local data frame [50 x 7]
Groups: factor_var [50]

   factor_var      mean.w x.sd.w    SE error  left right
        (dbl)       (dbl)  (dbl) (dbl) (dbl) (dbl) (dbl)
1           1  1.79711934    NaN   NaN   NaN   NaN   NaN
2           2 -0.70698012    NaN   NaN   NaN   NaN   NaN
3           3 -0.85125760    NaN   NaN   NaN   NaN   NaN
4           4 -0.93903314    NaN   NaN   NaN   NaN   NaN
5           5  0.09629631    NaN   NaN   NaN   NaN   NaN
6           6  1.02720022    NaN   NaN   NaN   NaN   NaN
7           7  1.35090758    NaN   NaN   NaN   NaN   NaN
8           8  0.67814249    NaN   NaN   NaN   NaN   NaN
9           9 -0.28251464    NaN   NaN   NaN   NaN   NaN
10         10  0.38572499    NaN   NaN   NaN   NaN   NaN
..        ...         ...    ...   ...   ...   ...   ...

但是，由于负权重（ data $ weigh t_var ）产生上述NAN。特别是 sqrt（负数）部分。

However, you data here is not very good as the negative weights (data$weight_var) produce the above NANs. In particular the sqrt(negative number) part.

这篇关于如何在dplyr中使用group_by（）和do（）为每个因子水平应用一个函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在dplyr中使用group_by（）和do（）为每个因子水平应用一个函数 [英] How to use group_by() and do() in dplyr to apply a function for each factor level

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在dplyr中使用group_by（）和do（）为每个因子水平应用一个函数 [英] How to use group_by() and do() in dplyr to apply a function for each factor level

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭