用* pply替代替换循环 [英] replace loop with an *pply alternative

查看:110
本文介绍了用* pply替代替换循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过用tapp替换一些查找循环来加速我的代码(

  DF <-data.frame(id = c(rep(A,5),rep(B,7),rep(C,9)) ,(系列= NA,chi = c(字母[1:5],LETTERS [6:12],字母[13:21]))
(i in unique(DF $ id)){
DF $ series [DF $ id == i]< -1:length(DF $ id [DF $ id == i])
}
DF

是否可以用 * apply 系列函数替换它?或者任何其他的方式来加快速度?

解决方案

您可以尝试 ave

  DF $ series <-ave(DF $ id,DF $ id,FUN = seq_along)
对于较大的数据集,



> (df)变换(df,series = ave(df))(b)b

  library(dplyr)

fun_ave < id,FUN = seq_along))

fun_dp< - 函数(df)df%。%
group_by(id)%。%
mutate(
(id = sample(seq_along(id))

df < - data.frame(id = sample(letters [1:3],100000,replace = TRUE))

microbenchmark (fun_ave(df))
#单位:毫秒
#expr min lq中值uq最大值neval
#fun_ave(df)38.59112 39.40802 50.77921 51.2844 128.6791 100


microbenchmark(fun_dp(df))
#单位:毫秒
#expr min lq中位数uq max nev al
#fun_dp(df)4.977035 5.034244 5.060663 5.265173 17.16018 100


I am trying to speedup my code by replacing some lookup loops with tapply (How to do vlookup and fill down (like in Excel) in R?) and I stumbled upon this code piece:

DF<-data.frame(id=c(rep("A", 5),rep("B", 7),rep("C", 9)), series=NA, chi=c(letters[1:5], LETTERS[6:12], letters[13:21]))
for (i in unique(DF$id)){
  DF$series[ DF$id==i ]<-1:length(DF$id[ DF$id==i ])
}
DF

Is it possible to replace this with an *apply family function? Or any other way to speed it up?

解决方案

You may try ave:

DF$series <- ave(DF$id, DF$id, FUN = seq_along)

For larger data sets, dplyr is faster though.

library(dplyr)

fun_ave <- function(df) transform(df, series = ave(id, id, FUN = seq_along))

fun_dp <- function(df) df %.%
                 group_by(id) %.%
                 mutate(
                   series = seq_along(id))

df <- data.frame(id= sample(letters[1:3], 100000, replace = TRUE))

microbenchmark(fun_ave(df))
# Unit: milliseconds
#        expr      min       lq   median      uq      max neval
# fun_ave(df) 38.59112 39.40802 50.77921 51.2844 128.6791   100


microbenchmark(fun_dp(df))
# Unit: milliseconds
#       expr      min       lq   median       uq      max neval
# fun_dp(df) 4.977035 5.034244 5.060663 5.265173 17.16018   100

这篇关于用* pply替代替换循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆