R支持8位变量吗? [英] Does R support 8bit variables?

查看:244
本文介绍了R支持8位变量吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图读取一个大的(〜700Mb).csv文件到R中。

I am trying to read a large (~700Mb) .csv file into R.

该文件包含一个小于256的整数数组,和

The file contains an array of integers less than 256, with a header row and 2 header columns.

我使用:

trainSet <- read.csv(trainFileName)

这最终会变成:

Loading Data...
R(2760) malloc: *** mmap(size=151552) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(2760) malloc: *** mmap(size=151552) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Error: cannot allocate vector of size 145 Kb
Execution halted

查看内存使用情况,它在6Gb机器上解决了3Gb使用率的问题,

Looking at the memory usage, it conks out at about 3Gb usage on a 6Gb machine with zero page file usage at the time of the crash, so there may be another way to fix it.

如果我使用:

trainSet <- read.csv(trainFileName, header=TRUE, nrows=100)
classes = sapply(train,class); 

我可以看到所有的列都被加载为整数,我认为是32位。

I can see that all the columns are being loaded as "integer" which I think is 32 bits.

清楚地使用3Gb加载700Mb .csv文件的一部分远远没有效率。我不知道有没有办法告诉R使用8位数字的列?这是我过去在Matlab中做的,它的工作是一种对待,但是,我似乎找不到任何地方提到一个8位类型在R。

Clearly using 3Gb to load a part of a 700Mb .csv file is far from efficient. I wonder if there's a way to tell R to use 8 bit numbers for the columns? This is what I've done in the past in Matlab and it's worked a treat, however, I can't seem to find anywhere a mention of an 8 bit type in R.

它存在吗?

预先感谢任何帮助。

推荐答案

c:


 These functions can use a surprising amount of memory when reading
 large files.  There is extensive discussion in the ‘R Data
 Import/Export’ manual, supplementing the notes here.

 Less memory will be used if ‘colClasses’ is specified as one of
 the six atomic vector classes.  This can be particularly so when
 reading a column that takes many distinct numeric values, as
 storing each distinct value as a character string can take up to
 14 times as much memory as storing it as an integer.

 Using ‘nrows’, even as a mild over-estimate, will help memory
 usage.


此示例需要一段时间才能运行,因为I /与我的SSD,但没有内存问题:

This example takes awhile to run because of I/O, even with my SSD, but there are no memory issues:

R> # In one R session
R> x <- matrix(sample(256,2e8,TRUE),ncol=2)
R> write.csv(x,"700mb.csv",row.names=FALSE)

R> # In a new R session
R> x <- read.csv("700mb.csv", colClasses=c("integer","integer"),
+ header=TRUE, nrows=1e8)
R> gc()
            used  (Mb) gc trigger   (Mb)  max used   (Mb)
Ncells    173632   9.3     350000   18.7    350000   18.7
Vcells 100276451 765.1  221142070 1687.2 200277306 1528.0
R> # Max memory used ~1.5Gb
R> print(object.size(x), units="Mb")
762.9 Mb

这篇关于R支持8位变量吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆