创建按ID聚合列的变量的更快方法 [英] faster way to create variable that aggregates a column by id

查看：82 发布时间：2020/5/28 20:24:03 performance r aggregate plyr

本文介绍了创建按ID聚合列的变量的更快方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有更快的方法吗?我想这是不必要的缓慢，并且可以通过基本函数来完成这样的任务.

Is there a faster way to do this? I guess this is unnecessary slow and that a task like this can be accomplished with base functions.

df <- ddply(df, "id", function(x) cbind(x, perc.total = sum(x$cand.perc)))

我对R还是很陌生.我看过by()，aggregate()和tapply()，但是根本没有使它们按我想要的方式工作.我不想返回一个较短的向量，而是将总和附加到原始数据帧.最好的方法是什么?

I'm quite new to R. I have looked at by(), aggregate() and tapply(), but didn't get them to work at all or in the way I wanted. Rather than returning a shorter vector, I want to attach the sum to the original dataframe. What is the best way to do this?

这是对应用到我的数据的答案的速度比较.

Here is a speed comparison of the answers applied to my data.

> # My original solution
> system.time( ddply(df, "id", function(x) cbind(x, perc.total = sum(x$cand.perc))) )
   user  system elapsed 
 14.405   0.000  14.479 

> # Paul Hiemstra
> system.time( ddply(df, "id", transform, perc.total = sum(cand.perc)) )
   user  system elapsed 
 15.973   0.000  15.992 

> # Richie Cotton
> system.time( with(df, tapply(df$cand.perc, df$id, sum))[df$id] )
   user  system elapsed 
  0.048   0.000   0.048 

> # John
> system.time( with(df, ave(cand.perc, id, FUN = sum)) )
       user  system elapsed 
      0.032   0.000   0.030 

> # Christoph_J
> system.time( df[ , list(perc.total = sum(cand.perc)), by="id"][df])
   user  system elapsed 
  0.028   0.000   0.028

推荐答案

对于任何类型的聚合，您想要的结果是与输入矢量具有相同长度的输入矢量，其中复制矢量跨分组矢量ave进行分组.

For any kind of aggregation where you want a resulting vector the same length as the input vector with replicates grouped across the grouping vector ave is what you want.

df$perc.total <- ave(df$cand.perc, df$id, FUN = sum)

这篇关于创建按ID聚合列的变量的更快方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

创建按ID聚合列的变量的更快方法 [英] faster way to create variable that aggregates a column by id

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

创建按ID聚合列的变量的更快方法 [英] faster way to create variable that aggregates a column by id

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭