以Oozie工作流的变量捕获Oozie中的Spark Action Node的控制台输出 [英] Capture Console output of Spark Action Node in Oozie as variable across the Oozie Workflow

查看:333
本文介绍了以Oozie工作流的变量捕获Oozie中的Spark Action Node的控制台输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法在Oozie中捕获火花任务的控制台输出?我想在spark工作后的下一个action节点中使用特定的打印值。



我在想我可能会使用 $ {wf:actionData(action-id)[Variable]} 但似乎oozie没有能力捕获spark动作节点的输出,使用 echovar = 12345,然后调用oozie中的wf:actionData以用作整个工作流程中的Oozie变量。



我想实现这个目标,因为我想打印可能数量的已处理记录并将其存储为oozie变量,并将其用于工作流中的下一个操作节点,而不需要任何需要存储该数据的功能除了将工作流保存在表格中或者通过在Spark Scala程序中实现它们将它们存储为系统变量之外。



任何帮助都会被彻底赞赏,因为我还是个新手火花程序员。由于Spark动作不支持捕获输出,因此您必须将数据写入文件到HDFS。
这篇文章解释了如何从Spark中做到这一点。


Is there a way to capture the console output of a spark job in Oozie? I want to use the specific printed value in the next action node after the spark job.

I was thinking that I could have maybe used the ${wf:actionData("action-id")["Variable"]} but it seems that oozie does not have the capability to capture output from a spark action node unlike in the Shell action you could just use echo "var=12345" and then invoke the wf:actionData in oozie to be used as an Oozie Variable across the workflow.

I want to achieve that because I want to print the possible number of records processed and store that as an oozie variable and use that to the next action nodes in the workflow without doing any functionalities that requires you to store that data outside of the workflow like saving them in a table or storing them as a system variable via the implementing them inside the Spark Scala Program.

Any help would be thoroughly appreciated since I'm still a novice spark programmer. Thank you very much.

解决方案

As Spark action does not support capture-output, you'll have to write the data into a file to HDFS. This post explains how to do that from Spark.

这篇关于以Oozie工作流的变量捕获Oozie中的Spark Action Node的控制台输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆