Hadoop Capacity Scheduler和Spark [英] Hadoop Capacity Scheduler and Spark

查看:95
本文介绍了Hadoop Capacity Scheduler和Spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我按照此处的说明在纱线中定义CapacityScheduler队列

If I define CapacityScheduler Queues in yarn as explained here

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

我该如何使用它?

我想运行spark作业...但是它们不应该占用所有群集,而应该在CapacityScheduler上执行,CapacityScheduler已为其分配了一组固定的资源.

I want to run spark jobs... but they should not take up all the cluster but instead execute on a CapacityScheduler which has a fixed set of resources allocated to it.

这是否可能……特别是在cloudera平台上(假设cloudera上的火花在纱线上运行?)

Is that possible ... specifically on the cloudera platform (given that spark on cloudera runs on yarn?).

推荐答案

  1. 您应通过编辑Capacity-scheduler.xml来根据需要配置CapacityScheduler.您还需要在yarn-site.xml中将yarn.resourcemanager.scheduler.class指定为org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler,这也是当前hadoop版本的默认选项
  2. 将Spark作业提交到设计的队列中.

例如:

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    --queue thequeue \
    lib/spark-examples*.jar \
    10

-queue 指示您将提交的队列,该队列应与CapacityScheduler配置保持一致

The --queue indicates the queue you will submit which should be conformed with your CapacityScheduler configuration

这篇关于Hadoop Capacity Scheduler和Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆