EC2 上的 Hadoop 与 Elastic Map Reduce [英] Hadoop on EC2 vs Elastic Map Reduce

查看:32
本文介绍了EC2 上的 Hadoop 与 Elastic Map Reduce的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试评估这两个选项之间的差异.以下是我能想到的一些优点和缺点:

I'm trying to evaluate the differences between these two options. Here are some pros and cons I can think of :

Elastic Map Reduce => 来自 Amazon 的更好支持,无需管理集群,更昂贵 (?)EC2 + Hadoop => 更好地控制您的 hadoop 配置,更便宜 (?)

Elastic Map Reduce => Better support from Amazon, No need to administer cluster, More Expensive (?) EC2 + Hadoop => More control of your hadoop configuration, Cheaper (?)

我想知道是否有人对 EC2 + Hadoop 与 EMR 的性能进行了基准测试?大型集群部署的成本是否有显着差异?还存在哪些其他差异?

I'm wondering if anyone might have benchmarked the performance of EC2 + Hadoop vis a vis EMR? Is there any significant difference in cost for large cluster deployments? What other differences exist?

推荐答案

嗯,管理/监控/维护集群本身并不是一项小任务.真正使用 EMR,您可以立即使用自定义引导程序代码配置和启动并运行机器.除了做所有这些事情之外,EMR 还提供了许多其他工具/选项/设施.

Well, administering/monitoring/maintaining a cluster isn't a small task in itself. Using EMR really you could get machines configured and up and running with your custom bootstrap code in no time. Apart from doing all these things EMR provides a A lot of other tools/options/facilities too.

在这里,您不必担心在作业完成后终止集群,您当然可以在 EC2+Hadoop 设置中为自己实施一种方法,但 EMR 以一种巧妙的方式为您完成此操作.

Here you don't have to worry about terminating a cluster after the jobs are done, you can surely implement a way for yourself in the EC2+Hadoop setup, but EMR does this for you in a neat way.

您还可以调整集群大小甚至当您的作业正在运行时!

Also you have facility to resize the cluster size even while your jobs are running!

EMR 提供的 Pig 和 Hive 还包含补丁,可以更轻松地处理 S3 中的文件.

The Pig and Hive that are available with EMR also contain patches which make it easier to work with files in S3.

甚至 此处 在这个答案中,您可能会发现 EMR 占了上风.

Even here in this answer you may find that EMR has been given an upper hand.

这篇关于EC2 上的 Hadoop 与 Elastic Map Reduce的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆