如何解决设置协调器oozie中文件不存在的错误 [英] How to solve the error when file doesn't exist in setting coordinator oozie
问题描述
设置协调器oozie时出现错误文件不存在的解决方法:
How to solution when error file doesnt exist in setting coordinator oozie:
我在日志协调器中有错误:
I have error in log coodinator:
猪日志文件转储:
错误:java.io.FileNotFoundException:文件不存在:/user/hdfs/jay/part-0.tmp
Error: java.io.FileNotFoundException: File does not exist: /user/hdfs/jay/part-0.tmp
设置协调员:
<coordinator-app name="tes-ng" frequency="${coord:minutes(15)}"
start="2015-12-07T10:30+0700" end="2017-02-28T23:00+0700" timezone="Asia/Jakarta"
xmlns="uri:oozie:coordinator:0.1" xmlns:sla="uri:oozie:sla:0.1">
<controls>
<execution>LAST_ONLY</execution>
</controls>
<datasets>
<dataset name="INPUT_DS" frequency="${coord:minutes(15)}"
initial-instance="2015-02-16T016:00+0700" timezone="Asia/Jakarta">
<uri-template>${nameNode}/user/hdfs/jay/${YEAR}/${MONTH}/${DAY}/${HOUR}${MINUTE}
</uri-template>
<done-flag></done-flag>
</dataset>
<dataset name="OUTPUT_DS" frequency="${coord:minutes(15)}"
initial-instance="2015-02-16T16:00+0700" timezone="Asia/Jakarta">
<uri-template>${nameNode}/user/hdfs/jay/output</uri-template>
<done-flag></done-flag>
</dataset>
</datasets>
<input-events>
<data-in name="INPUT" dataset="INPUT_DS">
<instance>${coord:current(-2)}</instance>
</data-in>
</input-events>
<output-events>
<data-out name="OUTPUT" dataset="OUTPUT_DS">
<instance>${coord:current(-2)}</instance>
</data-out>
</output-events>
<action>
<workflow>
<app-path>${appFolder}</app-path>
<configuration>
<property>
<name>INPUT</name>
<value>${coord:dataIn('INPUT')}</value>
</property>
<property>
<name>OUTPUT</name>
<value>${coord:dataOut('OUTPUT')}</value>
</property>
</configuration>
</workflow>
</action>
我想要的是当我得到错误文件不存在时,oozie 可以一直保持直到文件准备好.有什么想法..??
What I want is when I get error File does not exist, oozie can hold until file is all ready. any idea..??
谢谢.
推荐答案
通常的做法是拥有适当的数据依赖.创建输入数据的过程会创建一个文件,表明数据存在(例如 _SUCCESS).如果您在输入数据集中定义 a(例如 _SUCCESS),Oozie 将定期检查此文件是否存在,并仅在可用时启动工作流.
The ususal way to do this is to have a proper data dependency. The process that creates your input data creates a file that signales that the data is present (e.g. _SUCCESS). If you define a in your input dataset (e.g. _SUCCESS), Oozie will periodically check for existance of this file and only start the workflow when it is available.
<dataset name="INPUT_DS" frequency="${coord:minutes(15)}"
initial-instance="2015-02-16T016:00+0700" timezone="Asia/Jakarta">
<uri-template>${nameNode}/user/hdfs/jay/${YEAR}/${MONTH}/${DAY}/${HOUR}${MINUTE}
</uri-template>
<done-flag>_SUCCESS</done-flag>
</dataset>
如果您没有这样的标志,那么 AFAIK 唯一的选择就是编写您自己的输入数据检查并将其插入 Oozie(我见过有人为 Hive 分区这样做).
If you cannot have such a flag, then AFAIK the only option is to write your own input data check and plug it into Oozie (I've seen someone do that for Hive partitions).
您还应该仔细检查初始实例值,因为您似乎在其中放置了一个偏移量,然后在其顶部指定了 timezone=Asia/Jakarta
.
You should also double check the initial-instance value as it seems you've put an offset in there and then specified timezone=Asia/Jakarta
on top of it.
这篇关于如何解决设置协调器oozie中文件不存在的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!