为什么ff仍然将数据存储在RAM中? [英] Why does ff still store data in RAM?

查看:129
本文介绍了为什么ff仍然将数据存储在RAM中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 ff包R ,我导入了将csv文件转换为ffdf对象,但惊讶地发现该对象占用了约700MB的RAM. ff不应该将数据保留在磁盘上,而不是保留在RAM中吗?我做错什么了吗?我是R的新手.非常感谢您提出任何建议.谢谢.

Using the ff package of R, I imported a csv file into a ffdf object, but was surprised to find that the object occupied some 700MB of RAM. Isn't ff supposed to keep data on disk rather than in RAM? Did I do something wrong? I am a novice in R. Any advices are appreciated. Thanks.

> training.ffdf <- read.csv.ffdf(file="c:/temp/training.csv", header=T)
> # [Edit: the csv file is conceptually a large data frame consisting
> # of heterogeneous types of data --- some integers and some character
> # strings.]
>
> # The ffdf object occupies 718MB!!!
> object.size(training.ffdf)
753193048 bytes
Warning messages:
1: In structure(.Internal(object.size(x)), class = "object_size") :
  Reached total allocation of 1535Mb: see help(memory.size)
2: In structure(.Internal(object.size(x)), class = "object_size") :
  Reached total allocation of 1535Mb: see help(memory.size)
>
> # Shouldn't biglm be able to process data in small chunks?!
> fit <- biglm(y ~ as.factor(x), data=training.ffdf)
Error: cannot allocate vector of size 18.5 Mb

编辑:我听了Tommy的建议,省略了object.size调用,而是看了Task Manager(我在具有4GB RAM的Windows XP计算机上运行R).我保存该对象,关闭R,重新打开它,然后从文件中加载数据.问题盛行:

I followed the advice of Tommy, omitted the object.size call and looked at Task Manager (I ran R on a Windows XP machine with 4GB RAM). I ffsave the object, closed R, reopened it, and loaded the data from file. The problem prevailed:

> library(ff); library(biglm)
> # At this point RGui.exe had used up 26176 KB of memory
> ffload(file="c:/temp/trainingffimg")
> # Now 701160 KB
> fit <- biglm(y ~ as.factor(x), data=training.ffdf)
Error: cannot allocate vector of size 18.5 Mb

我也尝试过

> options("ffmaxbytes" = 402653184) # default = 804782080 B ~ 767.5 MB

但是在加载数据之后,RGui仍然使用了超过700MB的内存,并且biglm回归仍然发出错误.

but after loading the data, RGui still used up more than 700MB of memory and the biglm regression still issued an error.

推荐答案

您需要以块的形式将数据提供给biglm,请参阅?biglm. 如果传递ffdf对象而不是data.frame,则会遇到以下两个问题之一:

You need to provide the data in chunks to biglm, see ?biglm. If you pass a ffdf object instead of a data.frame, you run into one of the following two problems:

  1. ffdf不是data.frame,所以发生了一些未定义的事件
  2. 您传递给的函数尝试通过以下方式将ffdf转换为data.frame as.data.frame(ffdf),很容易耗尽您的RAM,这很可能就是您发生的事情

检查?chunk.ffdf,了解如何将块从ffdf传递到biglm的示例.

Check ?chunk.ffdf for an example of how to pass chunks from ffdf to biglm.

这篇关于为什么ff仍然将数据存储在RAM中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆