纱线上的火花:如何防止计划多个火花作业 [英] Spark on Yarn: How to prevent multiple spark jobs being scheduled
问题描述
纱线上有火花-我看不到一种方法来防止并发作业被调度.我有用于纯批处理的体系结构设置.
With spark on yarn - I dont see a way to prevent concurrent jobs being scheduled. I have my architecture setup for doing purely batch processing.
出于以下原因,我需要这样做:
I need this for the following reasons:
- 资源约束
- 用于火花的UserCache增长非常快.运行多个作业会导致缓存空间激增.
理想情况下,我很想看看是否有一个配置可以确保在Yarn上的任何时间都只能运行一项作业.
Ideally I'd love to see if there is a config that would ensure only one job to run at any time on Yarn.
推荐答案
您可以运行一个创建队列,该队列只能托管一个应用程序主服务器,并在该队列上运行所有Spark作业.因此,如果一个Spark作业正在运行,则另一个作业将被接受,但直到运行执行完成后,它们才被调度并运行...
You can run create a queue which can host only one application master and run all Spark jobs on that queue. Thus, if a Spark job is running the other will be accepted but they won't be scheduled and running until the running execution has finished...
这篇关于纱线上的火花:如何防止计划多个火花作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!