R:在同一数据帧上多次运行功能 [英] R: run function over same dataframe multiple times

查看:83
本文介绍了R:在同一数据帧上多次运行功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望在一个初始数据帧上多次应用一个功能.作为一个简单的示例,请使用以下数据:

I’m looking to apply a function over an initial dataframe multiple times. As a simple example, take this data:

library(dplyr)
thisdata <-  data.frame(vara = seq(from = 1, to = 20, by = 1)
                        ,varb = seq(from = 1, to = 20, by = 1))

这是一个我想对其运行的简单函数:

And here is a simple function I would like to run over it:

simplefunc <- function(data) {datasetfinal2 <- data %>% mutate(varb = varb+1)
return(datasetfinal2)}
thisdata2 <- simplefunc(thisdata)

thisdata3 <- simplefunc(thisdata2)

那么,我将如何运行此函数10次,而不必继续调用该函数(例如thisdata3)?我对复制后的最终数据帧最感兴趣,但是最好列出所有产生的数据帧,以便我可以运行一些诊断程序.感谢帮助!

So, how would I run this function, say 10 times, without having to keep calling the function (ie. thisdata3)? I’m mostly interested in the final dataframe after the replication but it would be good to have a list of all the dataframes produced so I can run some diagnostics. Appreciate the help!

推荐答案

分别处理多个相同结构的data.frames是管理事情的一种困难方法,尤其是在迭代次数多于几步的情况下.一种流行的最佳实践"是处理"data.frames列表",例如:

Dealing with multiple identically-structured data.frames individually is a difficult way to manage things, especially if the number of iterations is more than a few. A popular "best practice" is to deal with a "list of data.frames", something like:

n <- 10 # number of times you need to repeat the process
out <- vector("list", n)
out[[1]] <- thisdata
for (i in 2:n) out[[i]] <- simplefunc(out[[i-1]])

您可以使用以下任何中间值

You can look at any interim value with

str(out[[10]])
# 'data.frame': 20 obs. of  2 variables:
#  $ vara: num  1 2 3 4 5 6 7 8 9 10 ...
#  $ varb: num  10 11 12 13 14 15 16 17 18 19 ...

,如您所料,最终结果在out[[n]].

and, as you might expect, the final result is in out[[n]].

使用Reduce可以稍微简化一下,然后在simplefunc上添加一个一次性的第二个参数:

This can be simplified slightly using Reduce, and adding a throw-away second argument to simplefunc:

simplefunc <- function(data, ...) {
  datasetfinal2 <- data %>% mutate(varb = varb+1)
  return(datasetfinal2)
}
out <- Reduce(simplefunc, 1:10, init = thisdata, accumulate = TRUE)

这有效地做到了:

tmp <- simplefunc(thisdata, 1)
tmp <- simplefunc(tmp, 2)
tmp <- simplefunc(tmp, 3)
# ...

(实际上,如果您查看Reduce的来源,则实际上是在做我上面的第一个建议.)

(In fact, if you look at the source for Reduce, it's effectively doing my first suggestion above.)

请注意,如果simplefunc还有其他不能删除的参数,则可能是:

Note that if simplefunc has other arguments that cannot be dropped, perhaps:

simplefunc <- function(data, ..., otherarg, anotherarg) {
  datasetfinal2 <- data %>% mutate(varb = varb+1)
  return(datasetfinal2)
}

尽管您必须将所有其他调用更改为simplefunc以传递参数按名称"而不是按位置(这是常见/默认方式).

though you must change all other calls to simplefunc to pass parameters "by-name" instead of by-position (which is a common/default way).

编辑:如果您不能(或不想)编辑simplefunc,则始终可以使用匿名函数来忽略迭代器/计数器:

Edit: if you cannot (or do not want to) edit simplefunc, you can always use an anonymous function to ignore the iterator/counter:

Reduce(function(x, ign) simplefunc(x), 1:10, init = thisdata, accumulate = TRUE)

这篇关于R:在同一数据帧上多次运行功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆