R-大数据-向量超出向量长度限制 [英] R - Big Data - vector exceeds vector length limit

查看:329
本文介绍了R-大数据-向量超出向量长度限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下R代码:

data <- read.csv('testfile.data', header = T)
mat = as.matrix(data)

我的testfile.data的更多统计信息:

Some more statistics of my testfile.data:

> ncol(data)
[1] 75713
> nrow(data)
[1] 44771

由于这是一个很大的数据集,所以我正在使用具有64GB Ram空间的Amazon EC2.因此,希望内存不是问题.我能够加载数据(第一行有效). 但是as.matrix转换(第二行错误)会引发以下错误:

Since this is a large dataset, so I am using Amazon EC2 with 64GB Ram space. So hopefully memory isn't an issue. I am able to load the data (1st line works). But as.matrix transformation (2nd line errors) throws the following error:

resulting vector exceeds vector length limit in 'AnswerType'

任何线索可能是什么问题?

Any clue what might be the issue?

推荐答案

如前所述,R的开发版本支持大于2 ^ 31-1的向量.例如,这或多或少是透明的

As noted, the development version of R supports vectors larger than 2^31-1. This is more-or-less transparent, for instance

> m = matrix(0L, .Machine$integer.max / 4, 5)
> length(m)
[1] 2684354555

这是

> R.version.string
[1] "R Under development (unstable) (2012-08-07 r60193)"

大型对象会占用大量内存(例如,我的16G内存为62.5%),而要做任何有用的事情都需要数倍的内存.此外,即使对大数据进行简单的操作也需要花费大量时间.而且尚不支持对长向量的许多操作

Large objects consume a lot of memory (62.5% of my 16G, for my example) and to do anything useful requires several times that memory. Further, even simple operations on large data can take appreciable time. And many operations on long vectors are not yet supported

> sum(m)
Error: long vectors not supported yet:
    /home/mtmorgan/src/R-devel/src/include/Rinlinedfuns.h:100

因此,通过遍历较大的文件来处理较小的块中的数据通常很有意义.这样就可以完全访问R的例程,并允许并行评估(通过并行包).另一种策略是对数据进行降采样,这应该不会对统计受众造成太大的威胁.

So it often makes sense to process data in smaller chunks by iterating through a larger file. This gives full access to R's routines, and allows parallel evaluation (via the parallel package). Another strategy is to down-sample the data, which should not be too intimidating to a statistical audience.

这篇关于R-大数据-向量超出向量长度限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆