R支持8位变量吗? [英] Does R support 8bit variables?
问题描述
我试图读取一个大的(〜700Mb).csv文件到R中。
I am trying to read a large (~700Mb) .csv file into R.
该文件包含一个小于256的整数数组,和
The file contains an array of integers less than 256, with a header row and 2 header columns.
我使用:
trainSet <- read.csv(trainFileName)
这最终会变成:
Loading Data...
R(2760) malloc: *** mmap(size=151552) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(2760) malloc: *** mmap(size=151552) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Error: cannot allocate vector of size 145 Kb
Execution halted
查看内存使用情况,它在6Gb机器上解决了3Gb使用率的问题,
Looking at the memory usage, it conks out at about 3Gb usage on a 6Gb machine with zero page file usage at the time of the crash, so there may be another way to fix it.
如果我使用:
trainSet <- read.csv(trainFileName, header=TRUE, nrows=100)
classes = sapply(train,class);
我可以看到所有的列都被加载为整数,我认为是32位。
I can see that all the columns are being loaded as "integer" which I think is 32 bits.
清楚地使用3Gb加载700Mb .csv文件的一部分远远没有效率。我不知道有没有办法告诉R使用8位数字的列?这是我过去在Matlab中做的,它的工作是一种对待,但是,我似乎找不到任何地方提到一个8位类型在R。
Clearly using 3Gb to load a part of a 700Mb .csv file is far from efficient. I wonder if there's a way to tell R to use 8 bit numbers for the columns? This is what I've done in the past in Matlab and it's worked a treat, however, I can't seem to find anywhere a mention of an 8 bit type in R.
它存在吗?
预先感谢任何帮助。
推荐答案
c:
These functions can use a surprising amount of memory when reading
large files. There is extensive discussion in the ‘R Data
Import/Export’ manual, supplementing the notes here.
Less memory will be used if ‘colClasses’ is specified as one of
the six atomic vector classes. This can be particularly so when
reading a column that takes many distinct numeric values, as
storing each distinct value as a character string can take up to
14 times as much memory as storing it as an integer.
Using ‘nrows’, even as a mild over-estimate, will help memory
usage.
此示例需要一段时间才能运行,因为I /与我的SSD,但没有内存问题:
This example takes awhile to run because of I/O, even with my SSD, but there are no memory issues:
R> # In one R session
R> x <- matrix(sample(256,2e8,TRUE),ncol=2)
R> write.csv(x,"700mb.csv",row.names=FALSE)
R> # In a new R session
R> x <- read.csv("700mb.csv", colClasses=c("integer","integer"),
+ header=TRUE, nrows=1e8)
R> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 173632 9.3 350000 18.7 350000 18.7
Vcells 100276451 765.1 221142070 1687.2 200277306 1528.0
R> # Max memory used ~1.5Gb
R> print(object.size(x), units="Mb")
762.9 Mb
这篇关于R支持8位变量吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!