Autoscaling EMR是否需要?我应该只使用EC2吗?我应该只使用Qubole吗? [英] Autoscaling EMR- is it required? Should I just use EC2? Should I just use Qubole?

查看:146
本文介绍了Autoscaling EMR是否需要?我应该只使用EC2吗?我应该只使用Qubole吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了减少配置时间,我们决定继续使用5个实例的专用EMR集群(我们预计需要大约5个实例)。如果我们需要更多,我们认为我们需要实现某种自动缩放。



我对EMR并不熟悉,它支持自动缩放吗?我在文档中找到了这个: http:// docs。 aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-manage-resize.html



这是寻找自动缩放的正确位置还是我误解他们的意思是调整大小。我读过EMR的一个好处是按需处理,我认为它会分割ec2实例之间的负载,而不用指定多少实例,所以这给我的印象是它自己对ec2实例进行缩放,这意味着我们不需要自行扩展自己。我误解了按需处理是什么意思?



如果我提供的调整大小链接适合我所要做的事情,是否有人有确定<何时调整大小?该文档仅描述了如何但不是,例如,如何警报何时调整大小。我使用了他们的常规自动缩放服务,它允许您根据特定条件调整大小,但我在这里没有看到它。



我还不确定自动缩放EMR是一个糟糕的主意 - 它是否涉及太多(因为像Qubole这样的整个公司都提供这种功能),或者可能不是很有用,因为EMR已经使用了它需要的任何计算能力?我不太了解EMR实际提供的内容,所以也许这就是为什么我很困惑。

您链接的页面展示了手动或以编程方式增加群集中节点的方式。我找不到有关EMR自动缩放的其他内容。



除非我们遗漏了一些事实,否则您仍然必须提出自己的缩放算法和过程。如果您考虑了诸如工作积压,付款时间单位,使用价格较低的现货实例,多个集群等因素,这可能不是一件简单的事情。



除了增加群集的大小之外,还有缩小规模。 EMR允许这些(手动或编程)用于任务节点,但它们声明它们不适用于核心节点。您必须通过AWS功能终止核心节点,否则可能会丢失数据。如果您的工作量随着时间的推移而增加和减少,那么核心节点缩小规模对于降低您的成本非常有用。

Qubole自动处理所有这些开箱即用的功能。您可以通过UI或API运行您的作业,并启动,调整大小或调整群集大小。完成后,它会缩小或终止群集。它还允许您一次运行最少数量的节点。我也听说Qubole节点的启动时间比EMR快得多。



希望这可以帮助你。


In order to reduce the time for provisioning, we've decided to keep up a dedicated EMR cluster with 5 instances (we expect to need about 5). In case we need more, we think we'll need to implement some sort of autoscaling.

I'm not familiar at all with EMR- does it support autoscaling? I found this in the docs: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-manage-resize.html

Is that the correct place to look for autoscaling or am I misunderstanding what they mean by "resize". I've read that one benefit of EMR is the "on demand processing" and I think that it splits the load between ec2 instances without you specifying how many instances so this gives me the impression that it does the scaling of ec2 instances on its own, meaning we don't need to autoscale ourselves. Am I misunderstanding what "on demand processing" means?

If the resizing link I provided is appropriate for what I'm trying to do, does anyone have experience with determining when to resize? The doc only describes how but not, for example, how to have an alarm for when to resize. I've used their regular autoscaling service and it allows you to resize based on certain conditions but I'm not seeing this here.

I'm still unsure if autoscaling EMR is a bad idea- is it too involved (since there are entire companies like Qubole that provide this) or maybe not very useful since EMR already uses whatever computing power it needs? I don't know very much about what EMR actually provides so maybe that's why I'm confused.

解决方案

The page you linked showed ways of either manually or programmatically increasing the nodes in your cluster. I couldn't find anything else about autoscaling for EMR.

Unless we're missing some facts, you’d still have to come up with your own scaling algorithm and process. If you’re taking factors into account such as your job backlog, the units of time you’re paying for, the use of less-expensive "spot" instances, multiple clusters, etc, this is probably not a trivial exercise.

In addition to increasing size of your cluster, there is also downsizing. EMR allows this (manually or programmatically) for task nodes, but they state they don't for core nodes. You'd have to terminate the core node through AWS functionality and risk losing data. If your workloads increase and decrease over time, core node downsizing would be valuable for keeping your costs lower.

Qubole automatically takes care of all of these things out of the box. You run your jobs from the UI or API and it starts, sizes or resizes the cluster. When you're finished, it downsizes or terminates the cluster. It also allows you to have a minimum number of nodes constantly running at one time. I've also heard that the startup time for Qubole nodes is significantly faster than EMR.

Hope this helps you.

这篇关于Autoscaling EMR是否需要?我应该只使用EC2吗?我应该只使用Qubole吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆