释放R中的内存 [英] freeing up memory in R
问题描述
下载并解压缩它们非常简单(尽管在硬盘上占用大量空间
将350MB +的文件加载到R的一个月的数据是合理的直接与新的 fread()
函数在 data.table
包中。
完成一些数据转换(在函数内部),以便可以轻松读取时间戳并生成中间列。然后将数据表作为RData文件保存在硬盘驱动器上,并且所有对数据表对象的引用都将从工作区中移除,并在移除后运行 gc()
...然而,当我在我的活动监视器(从Mac运行)看R会话......它看起来好像占用了几乎1GB的内存......而事情似乎有点迟缓......我打算同时加载几年的csv文件,将它们转换为可用的数据表,合并它们,然后创建一个单一的xts对象,如果只有一个月使用1GB的RAM,这似乎是不可行的。
我知道我可以顺序下载每个文件,将其转换,保存关闭R并重复,直到我有一堆RData文件,我可以加载并绑定,但希望可能有更多有效的方式来做到这一点,以便在删除对数据表的所有引用后,您不会回到正常或启动RAM使用级别。有没有比 gc()
更好的清除内存的方法?任何建议将不胜感激。
在我的项目中,我不得不处理许多大文件。我按照以下原则组织了例程:
- 将单独的
R
脚本。 - 在执行后销毁的新进程中运行每个脚本。因此,系统将返回的已用内存返回。
- 通过文本文件将参数传递给脚本。 请考虑下面的玩具示例。
- Isolate memory-hungry operations in separate
R
scripts. - Run each script in new process which is destroyed after execution. Thus system gives used memory back.
- Pass parameters to the scripts via text file.
数据生成:
pre $
setwd( / path / to)
write.table(matrix(1:5e7,ncol = 10),temp.csv)#465.2 MB文件
slave.R - 内存消耗部分
setwd(/ path / to)
library(data.table)
#简单处理
f< - function(dt){
dt < - dt [1:nrow(dt),]
dt [,new.row:= 1]
return(dt)
}
#reads参数来自文件
csv< - read.table(io.csv)
infile< - as.character(csv [1,1])$ b $ b outfile< - as。字符(csv [2,1])$ b
$ b#内存耗尽操作
dt < - as.data.table(read.csv(infile))
dt < - f(dt)
write.table(dt,outfile)
主.R - 在单独的进程中执行从站
setwd(/ path / to)
#3文件处理
for(i in 1:3) {
#设置迭代特定参数
csv < - c(temp.csv,paste(temp,i,.csv,sep =))
write.table(csv,io.csv)
#执行从进程
system(R -f slave.R)
}
In R, I am trying to combine and convert several sets of timeseries data as an xts from http://www.truefx.com/?page=downloads however, the files are large and there many files so this is causing me issues on my laptop. They are stored as a csv file which have been compressed as a zip file.
Downloading them and unzipping them is easy enough (although takes up a lot of space on a hard drive).
Loading the 350MB+ files for one month's worth of data into the R is reasonably straight forward with the new fread()
function in the data.table
package.
Some datatable transformations are done (inside a function) so that the timestamps can be read easily and a mid column is produced. Then the datatable is saved as an RData file on the hard drive, and all references are to the datatable object are removed from the workspace, and a gc()
is run after removal...however when looking at the R session in my Activity Monitor (run from a Mac)...it still looks like it is taking up almost 1GB of RAM...and things seem a bit laggy...I was intending to load several years worth of the csv files at the same time, convert them to useable datatables, combine them and then create a single xts object, which seems infeasible if just one month uses 1GB of RAM.
I know I can sequentially download each file, convert it, save it shut down R and repeat until i have a bunch of RData files that i can just load and bind, but was hopeing there might be a more efficient manner to do this so that after removing all references to a datatable you get back not "normal" or at startup levels of RAM usage. Are there better ways of clearing memory than gc()
? Any suggestions would be greatly appreciated.
In my project I had to deal with many large files. I organized the routine on the following principles:
Consider the toy example below.
Data generation:
setwd("/path/to")
write.table(matrix(1:5e7, ncol=10), "temp.csv") # 465.2 Mb file
slave.R - memory consuming part
setwd("/path/to")
library(data.table)
# simple processing
f <- function(dt){
dt <- dt[1:nrow(dt),]
dt[,new.row:=1]
return (dt)
}
# reads parameters from file
csv <- read.table("io.csv")
infile <- as.character(csv[1,1])
outfile <- as.character(csv[2,1])
# memory-hungry operations
dt <- as.data.table(read.csv(infile))
dt <- f(dt)
write.table(dt, outfile)
master.R - executes slaves in separate processes
setwd("/path/to")
# 3 files processing
for(i in 1:3){
# sets iteration-specific parameters
csv <- c("temp.csv", paste("temp", i, ".csv", sep=""))
write.table(csv, "io.csv")
# executes slave process
system("R -f slave.R")
}
这篇关于释放R中的内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!