避免传递数据帧的最佳方法是什么? [英] What is the best way to avoid passing a data frame around?

查看:81
本文介绍了避免传递数据帧的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有12个 data.frame 可以使用。它们很相似,我必须对每个对象进行相同的处理,因此我编写了一个函数,该函数接受 data.frame 进行处理,然后返回 data.frame 。这可行。但是我恐怕要绕过一个很大的结构。我可能正在制作临时副本(是吗?),这样效率不高。避免传递 data.frame 的最佳方法是什么?

I have 12 data.frames to work with. They are similar and I have to do the same processing to each one, so I wrote a function that takes a data.frame, processes it, and then returns a data.frame. This works. But I am afraid that I am passing around a very big structure. I may be making temporary copies (am I?) This can't be efficient. What is the best way to avoid passing a data.frame around?

doSomething <- function(df) {
  // do something with the data frame, df
  return(df)
}


推荐答案

实际上,您是在传递对象并使用一些内存。但我认为您不能在R中的某个对象上进行操作而又不传递该对象。即使您没有创建函数,也没有在函数之外进行操作,R的行为也基本相同。

You are, indeed, passing the object around and using some memory. But I don't think you can do an operation on an object in R without passing the object around. Even if you didn't create a function and did your operations outside of the function, R would behave basically the same.

了解这一点的最好方法是建立一个示例。如果您在Windows中,请打开Windows任务管理器。如果您使用的是Linux,请打开一个终端窗口,然后运行top命令。在此示例中,我将假定Windows。在R中运行以下命令:

The best way to see this is to set up an example. If you are in Windows open Windows Task Manager. If you are in Linux open a terminal window and run the top command. I'm going to assume Windows in this example. In R run the following:

col1<-rnorm(1000000,0,1)
col2<-rnorm(1000000,1,2)
myframe<-data.frame(col1,col2)

rm(col1)
rm(col2)
gc()

这将创建几个称为col1和col2的向量,然后将它们组合到一个称为myframe的数据帧中。然后,它丢弃向量并强制垃圾收集运行。在Windows任务管理器中观看Rgui.exe任务的内存使用情况。当我启动R时,它使用约19兆的内存。运行上述命令后,我的机器在R上使用的内存不足35兆。

this creates a couple of vectors called col1 and col2 then combines them into a data frame called myframe. It then drops the vectors and forces garbage collection to run. Watch in your windows task manager at the mem usage for the Rgui.exe task. When I start R it uses about 19 meg of mem. After I run the above commands my machine is using just under 35 meg for R.

现在尝试以下操作:

myframe<-myframe+1

R的内存使用量应该跳到144兆以上。如果您使用gc()强制进行垃圾收集,您会看到它回落到35兆左右。要使用功能进行尝试,您可以执行以下操作:

your memory usage for R should jump to over 144 meg. If you force garbage collection using gc() you will see it drop back to around 35 meg. To try this using a function, you can do the following:

doSomething <- function(df) {
    df<-df+1-1
return(df)
}
myframe<-doSomething(myframe)

当您运行上面的代码时,内存使用量将跃升至160兆左右。运行gc()会将其降回35兆。

when you run the code above, memory usage will jump up to 160 meg or so. Running gc() will drop it back to 35 meg.

那么这一切怎么办?嗯,在功能之外执行操作并没有比在功能中执行效率更高。垃圾收集可以使事情变得更加美好。您是否应该强制gc()运行?可能不是,因为它不会根据需要自动运行,因此我只是在上面运行了它,以显示它如何影响内存使用。

So what to make of all this? Well, doing an operation outside of a function is not that much more efficient (in terms of memory) than doing it in a function. Garbage collection cleans things up real nice. Should you force gc() to run? Probably not as it will run automatically as needed, I just ran it above to show how it impacts memory usage.

我希望对您有所帮助!

这篇关于避免传递数据帧的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆