R中超大型数据集处理和机器学习的推荐软件包 [英] Recommended package for very large dataset processing and machine learning in R

查看:80
本文介绍了R中超大型数据集处理和机器学习的推荐软件包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎R确实是为处理可以完全拉入内存的数据集而设计的.对于无法拉入内存的超大型数据集,建议使用哪些R包进行信号处理和机器学习?

It seems like R is really designed to handle datasets that it can pull entirely into memory. What R packages are recommended for signal processing and machine learning on very large datasets that can not be pulled into memory?

如果R只是这样做的错误方法,我愿意接受其他可靠的免费建议(例如,如果有一些很好的方法可以处理非常大的数据集,则建议使用scipy)

If R is simply the wrong way to do this, I am open to other robust free suggestions (e.g. scipy if there is some nice way to handle very large datasets)

推荐答案

看看 bigmemory ff 是两个流行的软件包.对于bigmemory(以及相关的 biganalytics bigtabulate ), bigmemory网站杰伊·爱默生(Jay Emerson)提供了一些非常好的演示,小插图和概述.对于ff,我建议阅读 ff网站上的AdlerOehlschlägel及其同事出色的幻灯片演示.

Have a look at the "Large memory and out-of-memory data" subsection of the high performance computing task view on CRAN. bigmemory and ff are two popular packages. For bigmemory (and the related biganalytics, and bigtabulate), the bigmemory website has a few very good presentations, vignettes, and overviews from Jay Emerson. For ff, I recommend reading Adler Oehlschlägel and colleagues' excellent slide presentations on the ff website.

此外,请考虑将数据存储在数据库中并以较小的批次读取以进行分析.可能有许多方法可以考虑.首先,请仔细阅读 biglm 程序包中的一些示例作为此演示文稿(来自Thomas Lumley).

Also, consider storing data in a database and reading in smaller batches for analysis. There are likely any number of approaches to consider. To get started, consdier looking through some of the examples in the biglm package, as well as this presentation from Thomas Lumley.

并在高性能计算任务视图上调查其他软件包,并在其他答案中进行提及.我上面提到的软件包只是我碰巧拥有更多经验的那些软件包.

And do investigate the other packages on the high-performance computing task view and mentioned in the other answers. The packages I mention above are simply the ones I've happened to have more experience with.

这篇关于R中超大型数据集处理和机器学习的推荐软件包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆