如何为前一天配置 Oozie 协调器数据集 [英] How to configure Oozie coordinator dataset for previous day

查看:66
本文介绍了如何为前一天配置 Oozie 协调器数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据上一个日期的控制文件的可用性运行工作流.我的目录中的日期格式是 ${basePath}/YYYYMMdd/00/_Complete.我想检查我的 00 中的 _Complete 文件.我的工作将每天运行前一天的数据.我尝试了类似问题中提供的选项但仍然无法正常工作.例如,当我针对具有以下值的同一天数据对其进行测试时,它可以正常工作,但不能使用 (-1) 选项.URI-TEMPLATE 格式是否有任何限制,这意味着我们是否需要以固定格式路径/${YEAR}${$MONTH}${DAY}/Complete请帮忙.

I want to run workflow based on availability of Control files for previous date. Date format in my directory is ${basePath}/YYYYMMdd/00/_Complete.I want to check the _Complete file inside my 00. My Job will run daily on the previous day data. I tried the options provided in similar questions But still not working. When I am testing it for same day data with below value for instance , it is working But not with (-1) option. Is there any restriction on URI-TEMPLATE formats, meaning do we need to have it in fixed format path/${YEAR}${$MONTH}${DAY}/Complete Please help.

<instance>${coord:current(0)}</instance>

这是我的协调员工作的试运行输出.

Here is the dryrun output for my Coordinator job.

    ***coordJob after parsing: ***
<coordinator-app xmlns="uri:oozie:coordinator:0.1" name="my_Scheduler_5f" frequency="1" start="2016-08-17T23:40Z" end="2016-08-19T23:45Z" timezone="America/Los_Angeles" freq_timeunit="DAY" end_of_duration="NONE">
  <controls>
    <timeout>30</timeout>
  </controls>
  <input-events>
    <data-in name="coordInput_1" dataset="input1">
      <dataset name="input1" frequency="1" initial-instance="2016-08-17T00:00Z" timezone="America/Los_Angeles" freq_timeunit="DAY" end_of_duration="NONE">
        <uri-template>${nameNode}/myHdfsPath/Finalpath1/${YEAR}${MONTH}${DAY}/00/</uri-template>
        <done-flag>_Complete</done-flag>
      </dataset>
      <instance>${coord:current(-1)}</instance>
    </data-in>
    <data-in name="coordInput_2" dataset="input2">
      <dataset name="input2" frequency="1" initial-instance="2016-08-17T23:00Z" timezone="America/Los_Angeles" freq_timeunit="DAY" end_of_duration="NONE">
        <uri-template>${nameNode}/myHdfsPath/Finalpath2/${YEAR}${MONTH}${DAY}/00/</uri-template>
        <done-flag>_Complete</done-flag>
      </dataset>
      <instance>${coord:current(-1)}</instance>
    </data-in>
  </input-events>
  <action>
    <workflow>
      <app-path>${nameNode}/myHdfsPath/My_POC/wf-app-dir</app-path>
      <configuration>
        <property>
          <name>date</name>
          <value>${coord:formatTime(coord:dateOffset(coord:actualTime(),-1,'DAY'), "yyyyMMdd")}</value>
        </property>
    </workflow>
  </action>
</coordinator-app>
***actions for instance***

推荐答案

我的工作能够使用单独的 寻找正确的 _Complete 标志> 和 .

I was able to get my job to look for the right _Complete flag using separate <datasets> and <input-events>.

<datasets>
  <dataset name="input1" frequency="1" initial-instance="2016-08-17T00:00Z" timezone="America/Los_Angeles" freq_timeunit="DAY" end_of_duration="NONE">
    <uri-template>${nameNode}/myHdfsPath/Finalpath1/${YEAR}${MONTH}${DAY}/00/</uri-template>
    <done-flag>_Complete</done-flag>
  </dataset>
  ... input2 ...
</datasets>

<input-events>
  <data-in name="coordInput_1" dataset="input1">
    <instance>${coord:current(-1)}</instance>
  </data-in>
  ... coordInput_2 ...
</input-events>

current(-1) 是指定昨天的部分(对于每日数据集).就我而言,问题是我用 current(0) 复制了一个示例.

current(-1) is the part which specifies yesterday (for a daily dataset). In my case, the problem was that I'd copied an example with current(0).

这篇关于如何为前一天配置 Oozie 协调器数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆