在R,按组中的数据框上运行自定义函数 [英] Run a custom function on a data frame in R, by group

查看:274
本文介绍了在R,按组中的数据框上运行自定义函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



以下是一些示例数据:

  set.seed(42)
tm< - as.numeric(c(1,2,3,3 ,2,1,2,3,1,1))
d < - as.numeric(sample(0:2,size = 10,replace = TRUE )
t < - as.numeric(sample(0:2,size = 10,replace = TRUE))
h < - as.numeric(sample(0:2,size = 10,replace = TRUE))

df < - as.data.frame(cbind(tm,d,t,h))
df $ p < - rowSums(df [2:4 ])

我创建了一个自定义函数来计算值w:

  calc<  -  function(x){
data< - x
w < - (1.27 * sum(data $ d) + 1.62 * sum(data $ t)+ 2.10 * sum(data $ h))/ sum(data $ p)
w
}

当我在整个数据集上运行函数时,我得到以下答案:

 code> calc(df)
[1] 1.664474

理想情况下,要返回按tm分组的结果,例如:

  tm w 
1 calc的结果
2的结果calc
3 calc的结果

到目前为止,我已经尝试使用 aggregate 与我的功能,但我收到以下错误:

  aggregate(df, by = list(tm),FUN = calc)
数据错误$ d:$ operator对原子向量无效

我觉得我盯着这个太久了,有一个明显的答案。任何建议将不胜感激。

解决方案

使用 dplyr

  library(dplyr)
df%>%
group_by(tm)%>%
(data.frame(val = calc(。)))
#tm val
#1 1 1.665882
#2 2 1.504545
#3 3 1.838000

如果我们稍微更改函数以包含多个参数,这也可以与总结

  calc1<  - 函数(d1,t1,h1,p1){
(1.27 * sum (d1)+ 1.62 * sum(t1)+ 2.10 * sum(h1))/ sum(p1)}
df%>%
group_by(tm)%>%
summary (val = calc1(d,t,h,p))
#tm val
#1 1 1.665882
#2 2 1.504545
#3 3 1.838000


Having some trouble getting a custom function to loop over a group in a data frame.

Here is some sample data:

set.seed(42)
tm <- as.numeric(c("1", "2", "3", "3", "2", "1", "2", "3", "1", "1"))
d <- as.numeric(sample(0:2, size = 10, replace = TRUE))
t <- as.numeric(sample(0:2, size = 10, replace = TRUE))
h <- as.numeric(sample(0:2, size = 10, replace = TRUE))

df <- as.data.frame(cbind(tm, d, t, h))
df$p <- rowSums(df[2:4])

I created a custom function to calculate the value w:

calc <- function(x) {
  data <- x
  w <- (1.27*sum(data$d) + 1.62*sum(data$t) + 2.10*sum(data$h)) / sum(data$p)
  w
  }

When I run the function on the entire data set, I get the following answer:

calc(df)
[1]1.664474

Ideally, I want to return results that are grouped by tm, e.g.:

tm     w
1    result of calc
2    result of calc
3    result of calc

So far I have tried using aggregate with my function, but I get the following error:

aggregate(df, by = list(tm), FUN = calc)
Error in data$d : $ operator is invalid for atomic vectors

I feel like I have stared at this too long and there is an obvious answer. Any advice would be appreciated.

解决方案

Using dplyr

library(dplyr)
df %>% 
   group_by(tm) %>%
   do(data.frame(val=calc(.)))
#  tm      val
#1  1 1.665882
#2  2 1.504545
#3  3 1.838000

If we change the function slightly to include multiple arguments, this could also work with summarise

 calc1 <- function(d1, t1, h1, p1){
      (1.27*sum(d1) + 1.62*sum(t1) + 2.10*sum(h1) )/sum(p1) }
 df %>%
     group_by(tm) %>% 
     summarise(val=calc1(d, t, h, p))
 #  tm      val
 #1  1 1.665882
 #2  2 1.504545
 #3  3 1.838000

这篇关于在R,按组中的数据框上运行自定义函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆