星火运行在EC2上VS EMR [英] Spark running on EC2 vs EMR

查看:299
本文介绍了星火运行在EC2上VS EMR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们是被关于与数据科学毕业设计工作的学生,我们正在开发使用与Python(Pyspark)火花与Android应用程序(界面为用户)一个导购引擎,我们有遇到了很多路障,其中之一就是如何保持星火脚本并运行在云的快速处理和实时结果。
我们所知道EMR它比EC2更新,已经具有安装在Hadoop。
我们仍然有困难时期采取在其上使用的,什么是处理星火它们之间的差异决定的。

We are students that are working on a graduation project related to the Data Science, we are developing a Recommender Engine using Spark with python (Pyspark) with Android Application (Interface for the users) and we have a faced a lot of roadblocks, one of them was how to keep the Spark script up and running on a cloud for a fast processing and real-time results. All we knew about EMR that it's newer than EC2 and already has the Hadoop installed on it. We still have hard time taking the decision on which to use and what are the differences between them dealing with Spark.

推荐答案

EMR提供了一个简单易用的Hadoop /火花的服务。你只需要选择你要多少机器使用你想安装的组件(火花,Hadoop的),它们的版本,和其他几个选项,然后为您安装的一切。既然你是学生我假设你没有自​​动化工具,如Ansible,木偶或厨师的经验,也许你从来没有维护自己的Hadoop集群。如果是这样的情况下,我会明确地建议EMR。作为一个经验丰富的Hadoop /火花的用户,在同一时间,我可以告诉你,它有自身的局限性。当我3个月前用它,我想使用最新版本的电子病历(4.0如果没记错的话),因为它支持最新版本的星火,我有一些麻烦来定制它安装Java 8,而不是提供的Java 7,我相信这是他们的支持Java 8初期,他们应该有固定的现在。但是,这是你所有的全部纳入解决方案的灵活性,特别是如果你是一个专家用户错过了什么。

EMR provides a simple to use Hadoop/spark as service. You just have to select the components you want to be installed (spark, hadoop), their versions, how many machines you want to use and a couple other options and then it installs everything for you. Since you are students I assume you don't have experience in automation tools like Ansible, Puppet or Chef and probably you never had to maintain your own hadoop cluster. If that is the case I would definitively suggest EMR. As an experienced hadoop/spark user, at the same time I can tell you that it has its own limitations. When I used it 6 months ago I wanted to use the latest version of EMR (4.0 If remember correctly) because it supported the latest version of Spark and I had few headaches to customise it to install Java 8 instead of the provided Java 7. I believe it was their early days of supporting Java 8 and they should have fixed that by now. But this is what you miss with all the "all included" solutions, flexibility especially if you are an expert user.

这篇关于星火运行在EC2上VS EMR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆