R在加载大型数据集后运行非常慢> 8GB [英] R running very slowly after loading large datasets > 8GB

查看:231
本文介绍了R在加载大型数据集后运行非常慢> 8GB的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑到一旦加载数据集,它的运行速度有多么缓慢,我一直无法在R中工作.这些数据集总计约8GB.我正在8GB RAM上运行,并且已调整memory.limit以超过我的RAM,但似乎没有任何作用.另外,我使用了data.table包中的fread来读取这些文件.仅仅是因为read.table无法运行.

I have been unable to work in R given how slow it is operating once my datasets are loaded. These datasets total around 8GB. I am running on a 8GB RAM and have adjusted memory.limit to exceed my RAM but nothing seems to be working. Also, I have used fread from the data.table package to read these files; simply because read.table would not run.

在论坛上看到针对同一问题的类似帖子后,我尝试运行gctorture(),但无济于事.

After seeing a similar post on the forum addressing the same issue, I have attempted to run gctorture(), but to no avail.

R运行非常缓慢,以至于我什至无法检查上传的数据集列表的长度,View,或者一旦上传这些数据集就无法执行任何基本操作.

R is running so slowly that I cannot even check the length of the list of datasets I have uploaded, cannot View or do any basic operation once these datasets are uploaded.

我尝试将数据集以片段"的形式上传,因此,总文件的1/3超过3次,这似乎使导入部分的运行更加顺畅,但对于R的运行速度却没有任何改变在此之后运行.

I have tried uploading the datasets in 'pieces', so 1/3 of the total files over 3 times, which seemed to make things run more smoothly for the importing part, but has not changed anything with regards to how slow R runs after this.

有什么办法可以解决这个问题?任何帮助将不胜感激.

Is there any way to get around this issue? Any help would be much appreciated.

谢谢大家的光临.

推荐答案

之所以出现此问题,是因为R尝试将整个数据集加载到RAM中,这在尝试View数据时通常会使系统停止运行.

The problem arises because R loads the full dataset into the RAM which mostly brings the system to a halt when you try to View your data.

如果这是一个非常庞大的数据集,请首先确保数据仅包含最重要的列和行.可以通过您对问题拥有的领域和世界知识来识别有效的列.您也可以尝试消除缺少值的行.

If it's a really huge dataset, first make sure the data contains only the most important columns and rows. Valid columns can be identified through the domain and world knowledge you have about the problem. You can also try to eliminate rows with missing values.

完成此操作后,根据您的数据大小,您可以尝试不同的方法.一种是通过使用bigmemoryff之类的软件包.例如,bigmemory创建一个指针对象,您可以使用该对象从磁盘读取数据而无需将其加载到内存中.

Once this is done, depending on your size of the data, you can try different approaches. One is through the use of packages like bigmemory and ff. bigmemory for example, creates a pointer object using which you can read the data from disk without loading it to the memory.

另一种方法是通过并行(隐式或显式). MapReduce是另一个软件包,对于处理大型数据集非常有用.

Another approach is through parallelism (implicit or explicit). MapReduce is another package which is very useful for handling big datasets.

有关这些的更多信息,请查看关于rpubs和

For more information on these, check out this blog post on rpubs and this old but gold post from SO.

这篇关于R在加载大型数据集后运行非常慢> 8GB的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆