R:通过引用函数传递data.frame [英] R: Pass data.frame by reference to a function
问题描述
我将 data.frame
作为参数传递给想要更改内部数据的函数:
I pass a data.frame
as parameter to a function that want to alter the data inside:
x <- data.frame(value=c(1,2,3,4))
f <- function(d){
for(i in 1:nrow(d)) {
if(d$value[i] %% 2 == 0){
d$value[i] <-0
}
}
print(d)
}
当我执行 f(x)
我可以看到里面的 data.frame
是如何修改的:
When I execute f(x)
I can see how the data.frame
inside gets modified:
> f(x)
value
1 1
2 0
3 3
4 0
但是,我通过的原始 data.frame
未修改:
However, the original data.frame
I passed is unmodified:
> x
value
1 1
2 2
3 3
4 4
通常我通过返回修改后的值来克服了这个问题:
Usually I have overcame this by returning the modified one:
f <- function(d){
for(i in 1:nrow(d)) {
if(d$value[i] %% 2 == 0){
d$value[i] <-0
}
}
d
}
然后调用重新分配内容的方法:
And then call the method reassigning the content:
> x <- f(x)
> x
value
1 1
2 0
3 3
4 0
但是,我不知道这种行为在很大的 data.frame
中有什么作用,它是为方法执行而增长的新方法?哪种R-ish方法可以做到这一点?
However, I wonder what is the effect of this behaviour in a very large data.frame
, is a new one grown for the method execution? Which is the R-ish way of doing this?
有没有一种方法可以修改原始的而不在内存中创建另一个?
Is there a way to modify the original one without creating another one in memory?
推荐答案
实际上,在R中(几乎)每次修改都是在先前数据的副本上执行的()。
例如,在函数内部,当您执行 d $ value [i]< -0
时,实际上会创建一些副本。您通常不会注意到它,因为它已经过优化,但是您可以使用 tracemem
函数。
Actually in R (almost) each modification is performed on a copy of the previous data (copy-on-writing behavior).
So for example inside your function, when you do d$value[i] <-0
actually some copies are created. You usually won't notice that since it's well optimized, but you can trace it by using tracemem
function.
话虽如此,如果您的数据.frame并不是很大,您可以继续使用函数返回修改后的对象,因为毕竟它只是一个副本。
That being said, if your data.frame is not really big you can stick with your function returning the modified object, since it's just one more copy afterall.
但是,如果您的数据集确实很大并且正在做每次复制都会非常昂贵,您可以使用data.table,它允许就地修改,例如:
But, if your dataset is really big and doing a copy everytime can be really expensive, you can use data.table, that allows in-place modifications, e.g. :
library(data.table)
d <- data.table(value=c(1,2,3,4))
f <- function(d){
for(i in 1:nrow(d)) {
if(d$value[i] %% 2 == 0){
set(d,i,1L,0) # special function of data.table (see also ?`:=` )
}
}
print(d)
}
f(d)
print(d)
# results :
> f(d)
value
1: 1
2: 0
3: 3
4: 0
>
> print(d)
value
1: 1
2: 0
3: 3
4: 0
NB
在这种情况下,可以将循环替换为一个矢量化的和更有效的版本,例如:
In this specific case, the loop can be replaced with a "vectorized" and more efficient version e.g. :
d[d$value %% 2 == 0,'value'] <- 0
但是也许您的真实循环代码更加复杂,无法轻松向量化。
but maybe your real loop code is much more convoluted and cannot be vectorized easily.
这篇关于R:通过引用函数传递data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!