EMR与AWS上的EC2 / Hadoop [英] EMR vs EC2/Hadoop on AWS

查看:380
本文介绍了EMR与AWS上的EC2 / Hadoop的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道EC2比EMR更灵活但更多工作。然而,就成本而言,如果使用EC2,可能需要将EBS卷附加到EC2实例,而AWS只是从S3流入数据。因此,在AWS计算器上计算数字,即使对于EMR,也必须为EC2支付,EMR变得比EC2便宜?我错了吗?
当然EC2与EBS的速度可能更快,但它是值得的成本?



谢谢,
Matt


EMR为您做了很多事情,您在EC2上的标准Hadoop中找不到。一些特别重要的包括:


  • 将Hadoop日志从您的机器复制到S3。这对于在群集关闭后调试错误非常有用。
  • 运行多个MapReduce,Pig或Hive作业的作业流

  • 设置根据您选择的硬件大小设置合理的配置默认值
  • 访问专注于便宜计算的实例
  • 可动态调整集群大小



您还会发现EMR S3文件系统比使用Apache Hadoop打包的标准文件系统更快,更可靠。它支持分段上传,并且直接将流写入S3而不是首先缓冲到磁盘。有关详情,请参阅

I know that EC2 is more flexible but more work over EMR. However in terms of costs, if using EC2 it probably requires EBS volumes attached to the EC2 instances, whereas AWS just streams in data from S3. So crunching the numbers on the AWS calculator, even though for EMR one must pay for EC2 also, EMR becomes cheaper than EC2 ?? Am i wrong here ? Of course EC2 with EBS is probably faster, but is it worth the cost ?

thanks, Matt

解决方案

EMR does a lot of things for you that you won't find on standard Hadoop on EC2. Some particularly important ones include

  • Copying Hadoop logs from your machines to S3. This is very useful for debugging errors after the cluster has been shut down.
  • Running job flows of multiple MapReduce, Pig, or Hive jobs
  • Setting sensible configuration defaults based on hardware size you choose
  • Access to spot instances for cheaper compute
  • Ability to resize clusters dynamically

You'll also find that the EMR S3 filesystem is faster and more reliable than the standard one packaged with Apache Hadoop. It supports Multipart upload, and streams writes directly to S3 rather than buffering to disk first. For a bit more on this, see Tip #5

Additionally, if you do decide to use EC2 directly, I'd recommend using instance-storage instead of EBS for your nodes. There's really no reason to pay the extra cost of EBS for Hadoop; you'll notice that EMR clusters all run on instance-storage nodes as well.

这篇关于EMR与AWS上的EC2 / Hadoop的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆