R,RAM数量和避免内存错误的特定限制 [英] R, RAM amounts, and specific limitations to avoid memory errors

查看:98
本文介绍了R,RAM数量和避免内存错误的特定限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了有关R的各种大数据包的信息.许多看起来可行,除了至少据我所知,我想用于通用模型的许多包无法与推荐的大数据一起使用软件包(例如,我使用lme4,VGAM和其他相当常见的回归分析软件包,这些软件包似乎无法与ff等各种大数据软件包很好地配合使用.)

I have read about various big data packages with R. Many seem workable except that, at least as I understand the issue, many of the packages I like to use for common models would not be available in conjunction with the recommended big data packages (for instance, I use lme4, VGAM, and other fairly common varieties of regression analysis packages that don't seem to play well with the various big data packages like ff, etc.).

我最近尝试使用VGAM通过综合社会调查得出的数据进行多模型分析.当我扔出一些模型来运行时,这些模型说明了多年以来受访者的聚集情况以及其他控件的列表,我开始触及整个无法分配大小yadda yadda的向量..."的问题,我尝试了各种推荐项目,例如作为清除内存并在可能的情况下使用矩阵的效果不佳.我倾向于增加机器上的RAM(实际上只是购买一台具有更多RAM的新机器),但是我想知道一个好主意,在新机器上放下1500美元之前,是否能解决我的麻烦,特别是因为这是供我个人使用,将完全由我的研究生预算来资助.

I recently attempted to use VGAM to do polytomous models using data from the General Social Survey. When I tossed some models on to run that accounted for the clustering of respondents in years as well as a list of other controls I started hitting the whole "cannot allocate vector of size yadda yadda..." I've tried various recommended items such as clearing memory out and using matrices where possible to no good effect. I am inclined to increase the RAM on my machine (actually just buy a new machine with more RAM), but I want to get a good idea as to whether that will solve my woes before letting go of $1500 on a new machine, particularly since this is for my personal use and will be solely funded by me on my grad student budget.

当前,我正在运行具有16GB RAM,R 3.0.2的Windows 8计算机,并且我使用的所有软件包都已更新为最新版本.我通常使用的数据集最多只能有100,000个以下的个案/受访者.就分析而言,例如,如果我使用15个变量且因子之间的相互作用具有多个级别,或者如果我需要为我的100,000个中的每个矩阵中包含多个行,则我可能需要具有很多行的矩阵和/或数据帧案例基于每个受访者在每个类别的某些类别的DV上排成一行.对于某些社会科学工作来说,这可能有点大,但是我觉得在宏伟的计划中,就数据分析而言,我的要求实际上并不那么繁重.我敢肯定,许多R用户对大得多的数据都进行了更深入的分析.

Currently I am running a Windows 8 machine with 16GB RAM, R 3.0.2, and all packages I use have been updated to the most recent versions. The data sets I typically work with max out at under 100,000 individual cases/respondents. As far as analyses go, I may need matrices and/or data frames that have many rows if for example I use 15 variables with interactions between factors that have several levels or if I need to have multiple rows in a matrix for each of my 100,000 cases based on shaping to a row per each category of some DV per each respondent. That may be a touch large for some social science work, but I feel like in the grand scheme of things my requirements are actually not all that hefty as far as data analysis goes. I'm sure many R users do far more intense analyses on much bigger data.

因此,我想我的问题是-考虑到我通常使用的数据大小和分析类型,为避免内存错误和/或必须使用特殊的软件包来处理内存的合理数量的RAM是多少?我正在运行的数据/进程的大小?例如,我正在盯着一台具有32GB RAM的机器.那会削减吗?我应该选择64GB RAM吗?还是可以说,我真的需要硬着头皮,开始学习将R与大数据包结合使用,或者只是找到其他统计数据包,或者学习更强大的编程语言(甚至不确定Python是什么, C ++).从长远来看,后一种选择当然是不错的选择,但目前对我而言却是令人望而却步的.我在几个项目中处于中间阶段,遇到了类似的问题,并且没有时间在截止日期之前共同建立新的语言技能.

So, I guess my question is this - given the data size and types of analyses I'm typically working with, what would be a comfortable amount of RAM to avoid memory errors and/or having to use special packages to handle the size of the data/processes I'm running? For instance, I'm eye-balling a machine that sports 32GB RAM. Will that cut it? Should I go for 64GB RAM? Or do I really need to bite the bullet, so to speak, and start learning to use R with big data packages or maybe just find a different stats package or learn a more intense programming language (not even sure what that would be, Python, C++ ??). The latter option would be nice in the long run of course, but would be rather prohibitive for me at the moment. I'm mid-stream on a couple of projects where I am hitting similar issues and don't have time to build new language skills all together under deadlines.

尽可能具体-在具有16GB,32GB和64GB RAM的良好机器上64位R的最大功能是什么?我四处搜寻,但没有找到可以用来衡量个人需求的明确答案.

To be as specific as possible - What is the max capability of 64 bit R on a good machine with 16GB, 32GB, and 64GB RAM? I searched around and didn't find clear answers that I could use to gauge my personal needs at this time.

推荐答案

一般的经验法则是,R大约需要RAM中数据集大小的三倍才能正常工作.这是由于复制R中的对象引起的.因此,将RAM大小除以三,可以粗略估计出最大数据集大小.然后,您可以查看所使用的数据类型,并选择所需的RAM.

A general rule of thumb is that R roughly needs three times the dataset size in RAM to be able to work comfortably. This is caused by the copying of objects in R. So, divide your RAM size by three to get a rough estimate of your maximum dataset size. Then you can look at the type of data you use, and choose how much RAM you need.

当然,R还可以处理内存不足的数据,请参见 HPC任务视图.此

Of course, R can also process data out-of-memory, see the HPC task view. This earlier answer of mine might also be of interest.

这篇关于R,RAM数量和避免内存错误的特定限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆