R数据帧的实际限制 [英] Practical limits of R data frame
问题描述
我一直在阅读关于read.table对于大型数据文件的效率如何。还有R如何不适合大型数据集。所以我想知道在哪里可以找到实际的限制和任何性能图表(1)读取不同大小的数据(2)使用不同大小的数据。
I have been reading about how read.table is not efficient for large data files. Also how R is not suited for large data sets. So I was wondering where I can find what the practical limits are and any performance charts for (1) Reading in data of various sizes (2) working with data of varying sizes.
实际上,我想知道什么时候性能恶化,当我打了一个路障。还有对C ++ / MATLAB或其他语言的任何比较都是非常有帮助的。最后,如果Rcpp和RInside有特殊的性能比较,那将是非常好的!
In effect, I want to know when the performance deteriorates and when I hit a road block. Also any comparison against C++/MATLAB or other languages would be really helpful. finally if there is any special performance comparison for Rcpp and RInside, that would be great!
推荐答案
R em>适合大型数据集,但是您可能需要根据介绍性教科书教你的方式改变工作方式。我在大数据R,它支持一个30 GB的数据集,您可能会发现有用的灵感。
R is suited for large data sets, but you may have to change your way of working somewhat from what the introductory textbooks teach you. I did a post on Big Data for R which crunches a 30 GB data set and which you may find useful for inspiration.
通常的信息来源开始的是高性能计算任务视图和R-SIG HPC邮件列表在 R-SIG HPC 。
The usual sources for information to get started are High-Performance Computing Task View and the R-SIG HPC mailing list at R-SIG HPC.
你必须解决的主要限制是一个向量长度为2 ^ 31-1个元素的历史性限制,如果R没有将矩阵存储为向量,则不会太糟糕。 (限制是与某些BLAS库的兼容。)
The main limit you have to work around is a historic limit on the length of a vector to 2^31-1 elements which wouldn't be so bad if R did not store matrices as vectors. (The limit is for compatibility with some BLAS libraries.)
我们定期使用R对具有数百万客户的电话呼叫数据记录和营销数据库进行分析,因此,如果您有兴趣,请多谈谈。
We regularly analyse telco call data records and marketing databases with multi-million customers using R, so would be happy to talk more if you are interested.
这篇关于R数据帧的实际限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!