R：通过引用函数传递data.frame [英] R: Pass data.frame by reference to a function

查看：232 发布时间：2020/10/16 22:40:13 r dataframe pass-by-reference

本文介绍了R：通过引用函数传递data.frame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我将 data.frame 作为参数传递给想要更改内部数据的函数：

I pass a data.frame as parameter to a function that want to alter the data inside:

x <- data.frame(value=c(1,2,3,4))
f <- function(d){
  for(i in 1:nrow(d)) {
    if(d$value[i] %% 2 == 0){
      d$value[i] <-0
    }
  }
  print(d)
}

当我执行 f（x）我可以看到里面的 data.frame 是如何修改的：

When I execute f(x) I can see how the data.frame inside gets modified:

> f(x)
  value
1     1
2     0
3     3
4     0

但是，我通过的原始 data.frame 未修改：

However, the original data.frame I passed is unmodified:

通常我通过返回修改后的值来克服了这个问题：

Usually I have overcame this by returning the modified one:

f <- function(d){
  for(i in 1:nrow(d)) {
    if(d$value[i] %% 2 == 0){
      d$value[i] <-0
    }
  }
  d
}

然后调用重新分配内容的方法：

And then call the method reassigning the content:

> x <- f(x)
> x
  value
1     1
2     0
3     3
4     0

但是，我不知道这种行为在很大的 data.frame 中有什么作用，它是为方法执行而增长的新方法？哪种R-ish方法可以做到这一点？

However, I wonder what is the effect of this behaviour in a very large data.frame, is a new one grown for the method execution? Which is the R-ish way of doing this?

有没有一种方法可以修改原始的而不在内存中创建另一个？

Is there a way to modify the original one without creating another one in memory?

推荐答案

实际上，在R中（几乎）每次修改都是在先前数据的副本上执行的（）。

例如，在函数内部，当您执行 d $ value [i]< -0 时，实际上会创建一些副本。您通常不会注意到它，因为它已经过优化，但是您可以使用 tracemem 函数。

Actually in R (almost) each modification is performed on a copy of the previous data (copy-on-writing behavior).
So for example inside your function, when you do d$value[i] <-0 actually some copies are created. You usually won't notice that since it's well optimized, but you can trace it by using tracemem function.

话虽如此，如果您的数据.frame并不是很大，您可以继续使用函数返回修改后的对象，因为毕竟它只是一个副本。

That being said, if your data.frame is not really big you can stick with your function returning the modified object, since it's just one more copy afterall.

但是，如果您的数据集确实很大并且正在做每次复制都会非常昂贵，您可以使用data.table，它允许就地修改，例如：

But, if your dataset is really big and doing a copy everytime can be really expensive, you can use data.table, that allows in-place modifications, e.g. :

library(data.table)
d <- data.table(value=c(1,2,3,4))
f <- function(d){
  for(i in 1:nrow(d)) {
    if(d$value[i] %% 2 == 0){
      set(d,i,1L,0) # special function of data.table (see also ?`:=` )
    }
  }
  print(d)
}

f(d)
print(d)

# results :
> f(d)
   value
1:     1
2:     0
3:     3
4:     0
> 
> print(d)
   value
1:     1
2:     0
3:     3
4:     0

在这种情况下，可以将循环替换为一个矢量化的和更有效的版本，例如：

In this specific case, the loop can be replaced with a "vectorized" and more efficient version e.g. :

d[d$value %% 2 == 0,'value'] <- 0

但是也许您的真实循环代码更加复杂，无法轻松向量化。

but maybe your real loop code is much more convoluted and cannot be vectorized easily.

这篇关于R：通过引用函数传递data.frame的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R：通过引用函数传递data.frame [英] R: Pass data.frame by reference to a function

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R：通过引用函数传递data.frame [英] R: Pass data.frame by reference to a function

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭