修改大R数据框时,内存不足 [英] Out of memory when modifying a big R data.frame

查看:746
本文介绍了修改大R数据框时,内存不足的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的数据框架,大约有900MB的RAM。然后我试图修改它:

  dataframe [[17]] [37544] = 0 

似乎使R使用超过3G ram和R抱怨错误:不能分配大小3.0 Mb的向量,(I我在一个32位机器上。)



我发现这样更好:

  dataframe [37544,17] = 0 

但是R的占用仍然翻了一番,有一段时间运行。



从C / C ++背景来看,我真的很困惑这个行为。我认为像$ 数据框[37544,17] = 0 应该在闪烁的情况下完成,而不需要额外的内存(只有一个单元格应该修改)。我发布的那些命令是什么R什么是正确的方法来修改数据框架中的某些元素,而不会增加内存占用空间?



非常感谢您的帮助!



Tao

解决方案

在R相关讨论的上下文中查找copy-on-write记忆。只要一个(潜在的真正很大的)数据结构的一部分发生变化,就会复制一份。



一个有用的经验法则是,如果你最大的对象是N mb / gb / ...大,你需要大约3 * N的RAM。这是生活与解释系统。



几年前,当我不得不处理大量的数据在机器上(相对于数据量)相对较低的ram 32- bit机器,我从早期版本的bigmemory包中得到很好的使用。它使用外部指针接口将大量内存保留在R外面。不仅可以节省3倍因子,还可能会更多的消除不连续的内存(这是R喜欢的另一件事)。


I have a big data frame taking about 900MB ram. Then I tried to modify it like this:

dataframe[[17]][37544]=0 

It seems that makes R using more than 3G ram and R complains "Error: cannot allocate vector of size 3.0 Mb", ( I am on a 32bit machine.)

I found this way is better:

dataframe[37544, 17]=0

but R's footprint still doubled and the command takes quite some time to run.

From a C/C++ background, I am really confused about this behavior. I thought something like dataframe[37544, 17]=0 should be completed in a blink without costing any extra memory (only one cell should be modified). What is R doing for those commands I posted? What is the right way to modify some elements in a data frame then without doubling the memory footprint?

Thanks so much for your help!

Tao

解决方案

Look up 'copy-on-write' in the context of R discussions related to memory. As soon as one part of a (potentially really large) data structure changes, a copy is made.

A useful rule of thumb is that if your largest object is N mb/gb/... large, you need around 3*N of RAM. Such is life with an interpreted system.

Years ago when I had to handle large amounts of data on machines with (relative to the data volume) relatively low-ram 32-bit machines, I got good use out of early versions of the bigmemory package. It uses the 'external pointer' interface to keep large gobs of memory outside of R. That save you not only the '3x' factor, but possibly more as you may get away with non-contiguous memory (which is the other thing R likes).

这篇关于修改大R数据框时,内存不足的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆