试图避免使用sapply进行循环(对于gsub) [英] Trying to avoid for loop with sapply (for gsub)

查看:185
本文介绍了试图避免使用sapply进行循环(对于gsub)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果可能的话,通过利用 sapply 来避免在下面的代码中为循环使用循环。循环解决方案对我来说工作得非常好,我只是想学习更多的R并尽可能多地探索。

目标:有一个向量 i 和两个向量 sf (搜索)和 rp (替换)。对于每个需要循环 sf 并替换为 rp where match。

  i = c(1 6 5 4,7 4 3 1)
sf = c(1,2,3)
rp = c(one,two,three)

funn < - function i){
for(j in seq_along(sf))i = gsub(sf [j],rp [j],i,fixed = T)
return(i)
}
print(funn(i))

结果(正确):

  [1]one 6 5 47 4 three one

我想这样做,但用 sapply

 #尝试避免在一个有趣的
#funn1中使用for循环< - function(i){
#i = gsub(sf,rp,i ,fixed = T)
#return(i)
#}
#print(sapply(i,funn1))

显然,上面的注释代码不起作用,因为我只能得到 sf 的第一个元素。这是我第一次使用 sapply ,所以我不确定如何将一个内部隐式循环转换为向量化解决方案。任何帮助(即使是一个声明 - 这是不可能的)赞赏!

(我知道 mgsub 但是这不是这里的解决方案。想保留 gsub

编辑:解决方案和时间安排:

$ $ $ $ $ $ $ $ $ $ $ $ $
i = rep(c(1,2,3),100 )
rp = rep(c(one,two,three),100)

#loop
funn < - function(i){
for(j in seq_along(sf))i = gsub(sf [j],rp [j],i,fixed = T)
return(i)
}
t1 = proc.time()
k = funn(i)
t2 = proc.time()

#print(k)

print(microbenchmark (funn(i),times = 10))

#mapply
t3 = proc.time()
mapply(function(u,v)i<< (u,v,i),sf,rp)
t4 = proc.time()

#print(i)

print(microbenchmark(mapply函数(u,v)i <-gsub(u,v,i),sf,rp),times = 10))

#Curry
t5 = proc.time()
Reduce(Compose,Map(函数(u,v)Curry(gsub,pattern = u,replacement = v),sf )(i)
t6 = proc.time()

print(microbenchmark(Reduce(Compose,Map)(function(u,v)Curry(gsub,pattern = u, ($ s),(sf),rf))(i),times = 10))

#第四个选项
n < - length(sf)
sf < - setNames (sf,1:n)
rp < - setNames(rp,1:n)

t7 = proc.time()
Reduce(function(x,j) gsub(sf [j],rp [j],x,fixed = TRUE),c(list(i),as.list(1:n)))
t8 = proc.time()
(b,b,b,b,b,b,b,b,b, ))),times = 10))

#Usual proc.time
print(t2-t1)
print(t4-t3)
print(t6- t5)
print(t8-t7)

次数:

 单位:毫秒
表达式最小值lq平均值中值uq最大值neval
funn(i)143 143 149 145 147 165 10
单位:秒
表达式(u,v)i < - gsub(u,v,i),sf,rp)4.1 4.2 4.4 4.3 4.4 4.9 10
单位:秒
expr min lq平均中值uq max neval
Reduce(Compose,Map(函数(u,v)Curry(gsub,pattern = u,replacement = v),sf,rp))(i )1.6 1.6 1.7 1.7 1.7 1.7 10
单位:毫秒
expr min lq平均值中值uq max neval
Reduce(函数(x,j)gsub(sf [j],rp [j] ,x,fixed = TRUE),c(list(i),as.list(1:n)))141 144 147 145 146 162 10
用户系统已用
0.15 0.00 0.15
用户系统消耗
4.49 0.03 4.52
用户系统已消耗
1.68 0.02 1.68
用户系统已消耗
0.19 0.00 0.18

所以,确实在这个cas对于循环来说,提供了最好的时机,并且(在我看来)是最直接,最简单,最优雅的。坚持循环。



感谢所有。一个方法 - 优点是简洁,但显然不是功能性编程导向 - 因为它具有边界效应修改 i

  mapply(function(u,v)i< < -gsub(u,v,i),sf,rp)
#> i
#[1]one 6 5 47 4 three one

或者这里是一个纯粹的函数式编程方法:
$ b $ pre $
Reduce(Compose,Map(function(u ,v)Curry(gsub,pattern = u,replacement = v),sf,rp))(i)
#[1]one 6 5 47 4 three one

什么是 Map(函数(u,v)Curry(gsub,pattern = u,更换= v),sf,rp)建立一个函数列表,它将分别用一个 1 c $ c>, 2 2 等等,然后这些函数被组合并应用于,给出所需的结果。


Trying to avoid using a for loop in the following code by utilizing sapply, if at all possible. The solution with loop works perfectly fine for me, I'm just trying to learn more R and explore as many methods as possible.

Objective: have a vector i and two vectors sf (search for) and rp (replace). For each i need to loop over sf and replace with rp where match.

i  = c("1 6 5 4","7 4 3 1")
sf = c("1","2","3")
rp = c("one","two","three")

funn <- function(i) {
  for (j in seq_along(sf)) i = gsub(sf[j],rp[j],i,fixed=T)
  return(i)
}
print(funn(i))

Result (correct):

[1] "one 6 5 4"     "7 4 three one"

I'd like to do the very same, but with sapply

#Trying to avoid a for loop in a fun
#funn1 <- function(i) {
#  i = gsub(sf,rp,i,fixed=T)
#  return(i)
#}
#print(sapply(i,funn1))

Apparently, the above commented code will not work as I can only get the first element of the sf. This is my first time using sapply, so I'm not exactly sure how to convert an "inner" implicit loop into a vectorized solution. Any help (even a statement - this is not possible) is appreciated!

(I'm aware of mgsub but this is not the solution here. Would like to keep gsub)

EDIT: full code with packages and belowoffered solutions and timing:

#timing
library(microbenchmark)
library(functional)

i  = rep(c("1 6 5 4","7 4 3 1"),10000)
sf = rep(c("1","2","3"),100)
rp = rep(c("one","two","three"),100)

#Loop
funn <- function(i) {
  for (j in seq_along(sf)) i = gsub(sf[j],rp[j],i,fixed=T)
  return(i)
}
t1 = proc.time()
k = funn(i)
t2 = proc.time()

#print(k)

print(microbenchmark(funn(i),times=10))

#mapply
t3 = proc.time()
mapply(function(u,v) i<<-gsub(u,v,i), sf, rp)
t4 = proc.time()

#print(i)

print(microbenchmark(mapply(function(u,v) i<<-gsub(u,v,i), sf, rp),times=10))

#Curry
t5 = proc.time()
Reduce(Compose, Map(function(u,v) Curry(gsub, pattern=u, replacement=v), sf, rp))(i)
t6 = proc.time()

print(microbenchmark(Reduce(Compose, Map(function(u,v) Curry(gsub, pattern=u, replacement=v), sf, rp))(i), times=10))

#4th option
n <- length(sf)
sf <- setNames(sf,1:n)
rp <- setNames(rp,1:n)

t7 = proc.time()
Reduce(function(x,j) gsub(sf[j],rp[j],x,fixed=TRUE),c(list(i),as.list(1:n)))
t8 = proc.time()

print(microbenchmark(Reduce(function(x,j) gsub(sf[j],rp[j],x,fixed=TRUE),c(list(i),as.list(1:n))),times=10))

#Usual proc.time
print(t2-t1)
print(t4-t3)
print(t6-t5)
print(t8-t7)

Times:

Unit: milliseconds
    expr min  lq mean median  uq max neval
 funn(i) 143 143  149    145 147 165    10
Unit: seconds
                                               expr min  lq mean median  uq max neval
 mapply(function(u, v) i <<- gsub(u, v, i), sf, rp) 4.1 4.2  4.4    4.3 4.4 4.9    10
Unit: seconds
                                                                                           expr min  lq mean median  uq max neval
 Reduce(Compose, Map(function(u, v) Curry(gsub, pattern = u, replacement = v),      sf, rp))(i) 1.6 1.6  1.7    1.7 1.7 1.7    10
Unit: milliseconds
                                                                                      expr min  lq mean median  uq max neval
 Reduce(function(x, j) gsub(sf[j], rp[j], x, fixed = TRUE), c(list(i),      as.list(1:n))) 141 144  147    145 146 162    10
   user  system elapsed 
   0.15    0.00    0.15 
   user  system elapsed 
   4.49    0.03    4.52 
   user  system elapsed 
   1.68    0.02    1.68 
   user  system elapsed 
   0.19    0.00    0.18 

So, indeed in this case the for loop offers best timing and is (in my opinion) most straightforward, simple, and possibly elegant. Sticking to loop.

Thanks to all. All suggestions accepted and upvoted.

解决方案

One approach - advantage is conciseness but clearly not functional programming oriented - since it has border effect in modifying i:

mapply(function(u,v) i<<-gsub(u,v,i), sf, rp)
#> i
#[1] "one 6 5 4"     "7 4 three one"

Or here is a pure functional programming approach:

library(functional)
Reduce(Compose, Map(function(u,v) Curry(gsub, pattern=u, replacement=v), sf, rp))(i)
#[1] "one 6 5 4"     "7 4 three one"

What is does is that Map(function(u,v) Curry(gsub, pattern=u, replacement=v), sf, rp) builds a list of function which will respectively replace 1 with one, 2 with two, etc. Then these functions are composed and applied to i, giving the desired result.

这篇关于试图避免使用sapply进行循环(对于gsub)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆