R数据帧的实际限制 [英] Practical limits of R data frame

查看:106
本文介绍了R数据帧的实际限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读关于read.table对于大型数据文件的效率如何。还有R如何不适合大型数据集。所以我想知道在哪里可以找到实际的限制和任何性能图表(1)读取不同大小的数据(2)使用不同大小的数据。

I have been reading about how read.table is not efficient for large data files. Also how R is not suited for large data sets. So I was wondering where I can find what the practical limits are and any performance charts for (1) Reading in data of various sizes (2) working with data of varying sizes.

实际上,我想知道什么时候性能恶化,当我打了一个路障。还有对C ++ / MATLAB或其他语言的任何比较都是非常有帮助的。最后,如果Rcpp和RInside有特殊的性能比较,那将是非常好的!

In effect, I want to know when the performance deteriorates and when I hit a road block. Also any comparison against C++/MATLAB or other languages would be really helpful. finally if there is any special performance comparison for Rcpp and RInside, that would be great!

推荐答案

R em>适合大型数据集,但是您可能需要根据介绍性教科书教你的方式改变工作方式。我在大数据R,它支持一个30 GB的数据集,您可能会发现有用的灵感。

R is suited for large data sets, but you may have to change your way of working somewhat from what the introductory textbooks teach you. I did a post on Big Data for R which crunches a 30 GB data set and which you may find useful for inspiration.

通常的信息来源开始的是高性能计算任务视图和R-SIG HPC邮件列表在 R-SIG HPC

The usual sources for information to get started are High-Performance Computing Task View and the R-SIG HPC mailing list at R-SIG HPC.

你必须解决的主要限制是一个向量长度为​​2 ^ 31-1个元素的历史性限制,如果R没有将矩阵存储为向量,则不会太糟糕。 (限制是与某些BLAS库的兼容。)

The main limit you have to work around is a historic limit on the length of a vector to 2^31-1 elements which wouldn't be so bad if R did not store matrices as vectors. (The limit is for compatibility with some BLAS libraries.)

我们定期使用R对具有数百万客户的电话呼叫数据记录和营销数据库进行分析,因此,如果您有兴趣,请多谈谈。

We regularly analyse telco call data records and marketing databases with multi-million customers using R, so would be happy to talk more if you are interested.

这篇关于R数据帧的实际限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆