存储和使用对内存而言太大的数据帧的最佳做法? [英] Best practices for storing and using data frames too large for memory?

查看:76
本文介绍了存储和使用对内存而言太大的数据帧的最佳做法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理大型数据帧,并且遇到了RAM限制.此时,我可能需要使用磁盘上的序列化版本.有一些软件包支持内存不足的操作,但是我不确定哪一个适合我的需求.我希望将所有内容都保留在数据帧中,因此ff程序包看起来很令人鼓舞,但仍然存在无法解决的兼容性问题.

I'm working with a large data frame, and have run up against RAM limits. At this point, I probably need to work with a serialized version on the disk. There are a few packages to support out-of-memory operations, but I'm not sure which one will suit my needs. I'd prefer to keep everything in data frames, so the ff package looks encouraging, but there are still compatibility problems that I can't work around.

当您意识到数据已达到内存不足规模时,第一个可达到的工具是什么?

What's the first tool to reach for when you realize that your data has reached out-of-memory scale?

推荐答案

您可能希望查看以下软件包:

You probably want to look at these packages:

  • ff 用于平面文件"存储,并且非常高效的检索(可以执行data.frame;不同的数据类型)
  • bigmemory 用于非R内存,但仍在RAM(或文件支持)中使用(只能做矩阵;相同的数据类型)
  • biglm 用于内存不足的模型拟合lm()glm()样式的模型.
  • ff for 'flat-file' storage and very efficient retrieval (can do data.frames; different data types)
  • bigmemory for out-of-R-memory but still in RAM (or file-backed) use (can only do matrices; same data type)
  • biglm for out-of-memory model fitting with lm() and glm()-style models.

,另请参见高性能计算任务视图.

这篇关于存储和使用对内存而言太大的数据帧的最佳做法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆