R 中非常大的数据集处理和机器学习的推荐包 [英] Recommended package for very large dataset processing and machine learning in R

查看:41
本文介绍了R 中非常大的数据集处理和机器学习的推荐包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R 似乎真的是为处理可以完全存入内存的数据集而设计的.推荐哪些 R 包用于无法拉入内存的超大数据集的信号处理和机器学习?

It seems like R is really designed to handle datasets that it can pull entirely into memory. What R packages are recommended for signal processing and machine learning on very large datasets that can not be pulled into memory?

如果 R 只是错误的方法,我愿意接受其他强大的免费建议(例如 scipy,如果有一些处理非常大数据集的好方法)

If R is simply the wrong way to do this, I am open to other robust free suggestions (e.g. scipy if there is some nice way to handle very large datasets)

推荐答案

查看 CRAN 上的高性能计算任务视图.bigmemoryff 是两个流行的包.对于 bigmemory(以及相关的 biganalyticsbigtabulate),bigmemory 网站有一些非常好的演示文稿、小插曲和 Jay Emerson 的概述.对于 ff,我建议阅读 Adler Oehlschlägel 及其同事在 ff 网站上的出色幻灯片演示.

Have a look at the "Large memory and out-of-memory data" subsection of the high performance computing task view on CRAN. bigmemory and ff are two popular packages. For bigmemory (and the related biganalytics, and bigtabulate), the bigmemory website has a few very good presentations, vignettes, and overviews from Jay Emerson. For ff, I recommend reading Adler Oehlschlägel and colleagues' excellent slide presentations on the ff website.

此外,请考虑将数据存储在数据库中并分批读取以进行分析.可能有许多方法需要考虑.首先,consdier 查看 biglm 包中的一些示例,以及正如 Thomas Lumley 的本演示文稿.

Also, consider storing data in a database and reading in smaller batches for analysis. There are likely any number of approaches to consider. To get started, consdier looking through some of the examples in the biglm package, as well as this presentation from Thomas Lumley.

并调查高性能计算任务视图上的其他包,并在其他答案中提到.我上面提到的包只是我碰巧有更多经验的那些.

And do investigate the other packages on the high-performance computing task view and mentioned in the other answers. The packages I mention above are simply the ones I've happened to have more experience with.

这篇关于R 中非常大的数据集处理和机器学习的推荐包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆