运行计划的Spark作业 [英] Running scheduled Spark job

查看:64
本文介绍了运行计划的Spark作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Spark作业,该作业读取源表,执行许多map/flatten/reduce操作,然后将结果存储到用于报告的单独表中.当前,此作业是使用spark-submit脚本手动运行的.我想安排它每天晚上运行,以便在一天的开始时预先填充结果.我可以吗?

I have a Spark job which reads a source table, does a number of map / flatten / reduce operations and then stores the results into a separate table we use for reporting. Currently this job is run manually using the spark-submit script. I want to schedule it to run every night so the results are pre-populated for the start of the day. Do I:

  1. 设置cron作业以调用spark-submit脚本吗?
  2. 将计划添加到我的工作类别中,以便只提交一次但每天晚上执行操作吗?
  3. Spark中是否有内置机制或单独的脚本可以帮助我做到这一点?
  1. Set up a cron job to call the spark-submit script?
  2. Add scheduling into my job class, so that it is submitted once but performs the actions every night?
  3. Is there a built-in mechanism in Spark or a separate script that will help me do this?

我们在独立模式下运行Spark.

We are running Spark in Standalone mode.

任何建议表示赞赏!

推荐答案

Spark中没有内置的机制会有所帮助.对于您的情况,执行Cron工作似乎是合理的.如果发现自己不断向计划的作业添加依赖项,请尝试 Azkaban .

There is no built-in mechanism in Spark that will help. A cron job seems reasonable for your case. If you find yourself continuously adding dependencies to the scheduled job, try Azkaban.

这篇关于运行计划的Spark作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆