在设置协调器oozie时如何解决文件不存在时的错误 [英] How to solve the error when file doesn't exist in setting coordinator oozie

查看:268
本文介绍了在设置协调器oozie时如何解决文件不存在时的错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我在日志coodinator中有错误:



错误文件不存在

Pig日志文件转储:

后端错误消息



错误:java.io.FileNotFoundException:File does不存在:/user/hdfs/jay/part-0.tmp

settingan协调员:

 < coordinator-app name =tes-ngfrequency =$ {coord:minutes(15)}
start =2015-12-07T10:30 + 0700end =2017-02-28T23:00 + 0700timezone =Asia / Jakarta
xmlns =uri:oozie:coordinator:0.1xmlns:sla =uri:oozie:sla:0.1>
<控制>
<执行> LAST_ONLY< /执行>
< / controls>
< datasets>
initial-instance =2015-02-16T016:00 + 0700timezone =Asia / Jakarta >
$ {nameNode} / user / hdfs / jay / $ {YEAR} / $ {MONTH} / $ {DAY} / $ {HOUR} $ {MINUTE}
< / URI模板>
>< / done-flag>
< / dataset>
initial-instance =2015-02-16T16:00 + 0700timezone =Asia / Jakarta >
< uri-template> $ {nameNode} / user / hdfs / jay / output< / uri-template>
>< / done-flag>
< / dataset>
< / datasets>
< input-events>
< data-in name =INPUTdataset =INPUT_DS>
< instance> $ {coord:current(-2)}< / instance>
< / data-in>
< / input-events>
<输出事件>
< data-out name =OUTPUTdataset =OUTPUT_DS>
< instance> $ {coord:current(-2)}< / instance>
< / data-out>
< / output-events>
< action>
<工作流程>
< app-path> $ {appFolder}< / app-path>
<配置>
<属性>
< name> INPUT< / name>
< value> $ {coord:dataIn('INPUT')}< /值>
< / property>
<属性>
< name> OUTPUT< / name>
< value> $ {coord:dataOut('OUTPUT')}< /值>
< / property>
< / configuration>
< / workflow>
< / action>



我想要的是当我获取错误文件不存在,oozie可以保留,直到文件准备就绪。任何想法.. ??



谢谢。

解决方案

这样做的方法是具有适当的数据依赖性。创建输入数据的过程将创建一个文件,用于标识数据存在的标记(例如_SUCCESS)。如果您在输入数据集中定义了一个(例如_SUCCESS),Oozie会定期检查该文件是否存在,并且只有在工作流程可用时才会启动该工作流程。

 < dataset name =INPUT_DSfrequency =$ {coord:minutes(15)}
initial-instance =2015-02-16T016:00 + 0700timezone =Asia /雅加达>
$ {nameNode} / user / hdfs / jay / $ {YEAR} / $ {MONTH} / $ {DAY} / $ {HOUR} $ {MINUTE}
< / URI模板>
< / dataset>

如果您不能拥有这样的标志,那么AFAIK唯一的选择是编写您自己的输入数据检查并将其插入到Oozie中(我曾看到有人为Hive分区做过)。

您应该仔细检查初始实例值然后指定 timezone = Asia / Jakarta


How to solution when error file doesnt exist in setting coordinator oozie:

I have error in log coodinator:

Pig logfile dump:

Backend error message

Error: java.io.FileNotFoundException: File does not exist: /user/hdfs/jay/part-0.tmp

settingan coordinator:

<coordinator-app name="tes-ng" frequency="${coord:minutes(15)}"
start="2015-12-07T10:30+0700" end="2017-02-28T23:00+0700" timezone="Asia/Jakarta"
xmlns="uri:oozie:coordinator:0.1" xmlns:sla="uri:oozie:sla:0.1">
<controls>
    <execution>LAST_ONLY</execution>
</controls>
<datasets>
    <dataset name="INPUT_DS" frequency="${coord:minutes(15)}"
        initial-instance="2015-02-16T016:00+0700" timezone="Asia/Jakarta">
        <uri-template>${nameNode}/user/hdfs/jay/${YEAR}/${MONTH}/${DAY}/${HOUR}${MINUTE}
        </uri-template>
        <done-flag></done-flag>
    </dataset>
    <dataset name="OUTPUT_DS" frequency="${coord:minutes(15)}"
        initial-instance="2015-02-16T16:00+0700" timezone="Asia/Jakarta">
        <uri-template>${nameNode}/user/hdfs/jay/output</uri-template>
        <done-flag></done-flag>
    </dataset>
</datasets>
<input-events>
    <data-in name="INPUT" dataset="INPUT_DS">
        <instance>${coord:current(-2)}</instance>
    </data-in>
</input-events>
<output-events>
    <data-out name="OUTPUT" dataset="OUTPUT_DS">
        <instance>${coord:current(-2)}</instance>
    </data-out>
</output-events>
<action>
    <workflow>
        <app-path>${appFolder}</app-path>
        <configuration>
            <property>
                <name>INPUT</name>
                <value>${coord:dataIn('INPUT')}</value>
            </property>
            <property>
                <name>OUTPUT</name>
                <value>${coord:dataOut('OUTPUT')}</value>
            </property>
        </configuration>
    </workflow>
</action>

What I want is when I get error File does not exist, oozie can hold until file is all ready. any idea..??

Thanks.

解决方案

The ususal way to do this is to have a proper data dependency. The process that creates your input data creates a file that signales that the data is present (e.g. _SUCCESS). If you define a in your input dataset (e.g. _SUCCESS), Oozie will periodically check for existance of this file and only start the workflow when it is available.

<dataset name="INPUT_DS" frequency="${coord:minutes(15)}"
    initial-instance="2015-02-16T016:00+0700" timezone="Asia/Jakarta">
    <uri-template>${nameNode}/user/hdfs/jay/${YEAR}/${MONTH}/${DAY}/${HOUR}${MINUTE}
    </uri-template>
    <done-flag>_SUCCESS</done-flag>
</dataset>

If you cannot have such a flag, then AFAIK the only option is to write your own input data check and plug it into Oozie (I've seen someone do that for Hive partitions).

You should also double check the initial-instance value as it seems you've put an offset in there and then specified timezone=Asia/Jakarta on top of it.

这篇关于在设置协调器oozie时如何解决文件不存在时的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆