为什么ff仍然将数据存储在RAM中? [英] Why does ff still store data in RAM?
问题描述
使用 ff包R ,我导入了将csv文件转换为ffdf对象,但惊讶地发现该对象占用了约700MB的RAM. ff不应该将数据保留在磁盘上,而不是保留在RAM中吗?我做错什么了吗?我是R的新手.非常感谢您提出任何建议.谢谢.
Using the ff package of R, I imported a csv file into a ffdf object, but was surprised to find that the object occupied some 700MB of RAM. Isn't ff supposed to keep data on disk rather than in RAM? Did I do something wrong? I am a novice in R. Any advices are appreciated. Thanks.
> training.ffdf <- read.csv.ffdf(file="c:/temp/training.csv", header=T)
> # [Edit: the csv file is conceptually a large data frame consisting
> # of heterogeneous types of data --- some integers and some character
> # strings.]
>
> # The ffdf object occupies 718MB!!!
> object.size(training.ffdf)
753193048 bytes
Warning messages:
1: In structure(.Internal(object.size(x)), class = "object_size") :
Reached total allocation of 1535Mb: see help(memory.size)
2: In structure(.Internal(object.size(x)), class = "object_size") :
Reached total allocation of 1535Mb: see help(memory.size)
>
> # Shouldn't biglm be able to process data in small chunks?!
> fit <- biglm(y ~ as.factor(x), data=training.ffdf)
Error: cannot allocate vector of size 18.5 Mb
编辑:我听了Tommy的建议,省略了object.size调用,而是看了Task Manager(我在具有4GB RAM的Windows XP计算机上运行R).我保存该对象,关闭R,重新打开它,然后从文件中加载数据.问题盛行:
I followed the advice of Tommy, omitted the object.size call and looked at Task Manager (I ran R on a Windows XP machine with 4GB RAM). I ffsave the object, closed R, reopened it, and loaded the data from file. The problem prevailed:
> library(ff); library(biglm)
> # At this point RGui.exe had used up 26176 KB of memory
> ffload(file="c:/temp/trainingffimg")
> # Now 701160 KB
> fit <- biglm(y ~ as.factor(x), data=training.ffdf)
Error: cannot allocate vector of size 18.5 Mb
我也尝试过
> options("ffmaxbytes" = 402653184) # default = 804782080 B ~ 767.5 MB
但是在加载数据之后,RGui仍然使用了超过700MB的内存,并且biglm回归仍然发出错误.
but after loading the data, RGui still used up more than 700MB of memory and the biglm regression still issued an error.
推荐答案
您需要以块的形式将数据提供给biglm,请参阅?biglm. 如果传递ffdf对象而不是data.frame,则会遇到以下两个问题之一:
You need to provide the data in chunks to biglm, see ?biglm. If you pass a ffdf object instead of a data.frame, you run into one of the following two problems:
- ffdf不是data.frame,所以发生了一些未定义的事件
- 您传递给的函数尝试通过以下方式将ffdf转换为data.frame as.data.frame(ffdf),很容易耗尽您的RAM,这很可能就是您发生的事情
检查?chunk.ffdf,了解如何将块从ffdf传递到biglm的示例.
Check ?chunk.ffdf for an example of how to pass chunks from ffdf to biglm.
这篇关于为什么ff仍然将数据存储在RAM中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!