释放R中的内存 [英] freeing up memory in R

查看:168
本文介绍了释放R中的内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在R中,我尝试将多组时间序列数据合并并转换为来自 http://www.truefx但是,这些文件很大,并且有很多文件,所以这在我的笔记本电脑上引起了我的问题。它们被存储为一个csv文件,这些文件被压缩成一个zip文件。



下载并解压缩它们非常简单(尽管在硬盘上占用大量空间

将350MB +的文件加载到R的一个月的数据是合理的直接与新的 fread()函数在 data.table 包中。

完成一些数据转换(在函数内部),以便可以轻松读取时间戳并生成中间列。然后将数据表作为RData文件保存在硬盘驱动器上,并且所有对数据表对象的引用都将从工作​​区中移除,并在移除后运行 gc() ...然而,当我在我的活动监视器(从Mac运行)看R会话......它看起来好像占用了几乎1GB的内存......而事情似乎有点迟缓......我打算同时加载几年的csv文件,将它们转换为可用的数据表,合并它们,然后创建一个单一的xts对象,如果只有一个月使用1GB的RAM,这似乎是不可行的。



我知道我可以顺序下载每个文件,将其转换,保存关闭R并重复,直到我有一堆RData文件,我可以加载并绑定,但希望可能有更多有效的方式来做到这一点,以便在删除对数据表的所有引用后,您不会回到正常或启动RAM使用级别。有没有比 gc()更好的清除内存的方法?任何建议将不胜感激。

解决方案

在我的项目中,我不得不处理许多大文件。我按照以下原则组织了例程:


  1. 将单独的 R 脚本。

  2. 在执行后销毁的新进程中运行每个脚本。因此,系统将返回的已用内存返回。

  3. 通过文本文件将参数传递给脚本。
  4. 请考虑下面的玩具示例。

    数据生成:

    pre $ setwd( / path / to)
    write.table(matrix(1:5e7,ncol = 10),temp.csv)#465.2 MB文件

    slave.R - 内存消耗部分

      setwd(/ path / to)
    library(data.table)

    #简单处理
    f< - function(dt){
    dt < - dt [1:nrow(dt),]
    dt [,new.row:= 1]
    return(dt)
    }

    #reads参数来自文件
    csv< - read.table(io.csv)
    infile< - as.character(csv [1,1])$ ​​b $ b outfile< - as。字符(csv [2,1])$ ​​b
    $ b#内存耗尽操作
    dt < - as.data.table(read.csv(infile))
    dt < - f(dt)
    write.table(dt,outfile)

    主.R - 在单独的进程中执行从站

      setwd(/ path / to)

    #3文件处理
    for(i in 1:3) {
    #设置迭代特定参数
    csv < - c(temp.csv,paste(temp,i,.csv,sep =))
    write.table(csv,io.csv)

    #执行从进程
    system(R -f slave.R)
    }


    In R, I am trying to combine and convert several sets of timeseries data as an xts from http://www.truefx.com/?page=downloads however, the files are large and there many files so this is causing me issues on my laptop. They are stored as a csv file which have been compressed as a zip file.

    Downloading them and unzipping them is easy enough (although takes up a lot of space on a hard drive).

    Loading the 350MB+ files for one month's worth of data into the R is reasonably straight forward with the new fread() function in the data.table package.

    Some datatable transformations are done (inside a function) so that the timestamps can be read easily and a mid column is produced. Then the datatable is saved as an RData file on the hard drive, and all references are to the datatable object are removed from the workspace, and a gc() is run after removal...however when looking at the R session in my Activity Monitor (run from a Mac)...it still looks like it is taking up almost 1GB of RAM...and things seem a bit laggy...I was intending to load several years worth of the csv files at the same time, convert them to useable datatables, combine them and then create a single xts object, which seems infeasible if just one month uses 1GB of RAM.

    I know I can sequentially download each file, convert it, save it shut down R and repeat until i have a bunch of RData files that i can just load and bind, but was hopeing there might be a more efficient manner to do this so that after removing all references to a datatable you get back not "normal" or at startup levels of RAM usage. Are there better ways of clearing memory than gc()? Any suggestions would be greatly appreciated.

    解决方案

    In my project I had to deal with many large files. I organized the routine on the following principles:

    1. Isolate memory-hungry operations in separate R scripts.
    2. Run each script in new process which is destroyed after execution. Thus system gives used memory back.
    3. Pass parameters to the scripts via text file.

    Consider the toy example below.

    Data generation:

    setwd("/path/to")
    write.table(matrix(1:5e7, ncol=10), "temp.csv") # 465.2 Mb file
    

    slave.R - memory consuming part

    setwd("/path/to")
    library(data.table)
    
    # simple processing
    f <- function(dt){
      dt <- dt[1:nrow(dt),]
      dt[,new.row:=1]
      return (dt)
    }
    
    # reads parameters from file
    csv <- read.table("io.csv")
    infile  <- as.character(csv[1,1])
    outfile <- as.character(csv[2,1])
    
    # memory-hungry operations
    dt <- as.data.table(read.csv(infile))
    dt <- f(dt)
    write.table(dt, outfile)
    

    master.R - executes slaves in separate processes

    setwd("/path/to")
    
    # 3 files processing
    for(i in 1:3){
      # sets iteration-specific parameters
      csv <- c("temp.csv", paste("temp", i, ".csv", sep=""))
      write.table(csv, "io.csv")
    
      # executes slave process
      system("R -f slave.R")
    }
    

    这篇关于释放R中的内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆