使用AWS用于并行处理,其中R [英] Using AWS for parallel processing with R
问题描述
我想通过建立一个模型,为每一位客户采取射击在 Kaggle Dunnhumby挑战。我想将数据分成十组,并使用亚马逊网络服务(AWS)的生成使用R于十组并行模型。一些相关的链接我所遇到的是:
I want to take a shot at the Kaggle Dunnhumby challenge by building a model for each customer. I want to split the data into ten groups and use Amazon web-services (AWS) to build models using R on the ten groups in parallel. Some relevant links I have come across are:
- 的 SEGUE包;
- 系统 presentation 使用亚马逊的并行网络服务。
- The segue package;
- A presentation on parallel web-services using Amazon.
我不明白的是:
- 如何获取数据到十个节点?
- 如何发送和执行R的功能节点上?
我会很感激,如果你可以分享的建议和提示点我在正确的方向。
I would be very grateful if you could share suggestions and hints to point me in the right direction.
PS我使用AWS免费使用账户,但它是很难从源头上亚马逊的Linux AMI的安装R上。
PS I am using the free usage account on AWS but it was very difficult to install R from source on the Amazon Linux AMIs (lots of errors due to missing headers, libraries and other dependencies).
推荐答案
您可以在AWS手工建立的一切。你必须有多个实例建立你自己的Amazon计算机集群。有可用的一个很好的教程视频在亚马逊网站: http://www.youtube.com/watch?v=YfCgK1bmCjw
You can build up everything manually at AWS. You have to build your own amazon computer cluster with several instances. There is a nice tutorial video available at the Amazon website: http://www.youtube.com/watch?v=YfCgK1bmCjw
但它会带你几个小时才能运行一切:
But it will take you several hours to get everything running:
- 在首发11人EC2实例(每个组一个实例+一个脑袋实例)
- 在所有机器上的R和MPI(检查preinstalled图片)
- 在正确配置MPI(可能增加安全层)
- 在最好的情况下,将被安装到所有节点文件服务器(共享数据)
- 在这个基础架构的最佳解决方案是使用的雪或包装的foreach(与RMPI) 的
该SEGUE包是好的,但你一定会得到的数据通信问题!
The segue package is nice but you will definitely get data communication problems!
该simples解决方案是cloudnumbers.com(http://www.cloudnumbers.com)。这个平台提供了方便地访问计算机集群在云中。您可以测试5小时免费与云中的小计算机集群!检查从用户大会的幻灯片:<一href="http://cloudnumbers.com/hpc-news-from-the-user2011-conference">http://cloudnumbers.com/hpc-news-from-the-user2011-conference
The simples solution is cloudnumbers.com (http://www.cloudnumbers.com). This platform provides you with easy access to computer clusters in the cloud. You can test 5 hours for free with a small computer cluster in the cloud! Check the slides from the useR conference: http://cloudnumbers.com/hpc-news-from-the-user2011-conference
这篇关于使用AWS用于并行处理,其中R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!