oozie Sqoop操作无法将数据导入配置单元 [英] oozie Sqoop action fails to import data to hive

查看:243
本文介绍了oozie Sqoop操作无法将数据导入配置单元的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在执行oozie sqoop动作时遇到问题。
在日志中,我可以看到sqoop能够将数据导入临时目录,然后sqoop创建配置单元脚本来导入数据。



导入临时数据失败hive。

在日志中,我没有收到任何异常。



以下是我正在使用的sqoop操作。

 < workflow-app name =testSqoopLoadWorkflowxmlns =uri:oozie:workflow:0.4> 
<凭证>
<凭证名称='hive_credentials'type ='hcat'>
<属性>
< name> hcat.metastore.uri< / name>
<值> $ {HIVE_THRIFT_URL}< /值>
< / property>
<属性>
<名称> hcat.metastore.principal< /名称>
< value> $ {KERBEROS_PRINCIPAL}< /值>
< / property>
< / credential>
< / credentials>
< start to =loadSqoopDataAction/>
< action name =loadSqoopDataActioncred =hive_credentials>
< sqoop xmlns =uri:oozie:sqoop-action:0.2>
< job-tracker> $ {jobTracker}< / job-tracker>
< name-node> $ {nameNode}< / name-node>
< job-xml> /tmp/hive-oozie-site.xml< / job-xml>
<配置>
<属性>
<名称> oozie.hive.defaults< / name>
< value> /tmp/hive-oozie-site.xml< /值>
< / property>
< / configuration>
> job --meta-connect $ {SQOOP_METASTORE_URL} --exec TEST_SQOOP_LOAD_JOB< / command>
< / sqoop>
< error to =kill/>
< / action>

以下是我用来导入数据的sqoop作业。

  sqoop作业--meta-connect $ {SQOOP_METASTORE_URL}  - 创建TEST_SQOOP_LOAD_JOB  -  import --connect'$ {JDBC_URL}'--table testTable -m 1  - -append --check-column pkId  - 增量append --hive-import --hive-table testHiveTable; 

在映射日志中,我收到以下异常。



($ 1
$ b $)$ $ $ $ b<<主类的调用已完成<<<

失败的Oozie启动器,Main类[org.apache.oozie.action.hadoop.SqoopMain],退出代码[1]

Oozie启动器失败,正常完成Hadoop作业


Oozie启动器结束

请建议。

解决方案

这看起来像一个典型的将Sqoop导入到Hive 作业。因此,似乎Sqoop已成功将数据导入到HDFS中,并且无法将数据加载到Hive中。



以下是关于正在发生的事情的一些背景信息... Oozie启动了一项单独的工作(将在hadoop集群中的任何节点上执行)来运行Sqoop命令。 Sqoop命令启动一个单独的作业来将数据加载到HDFS中。然后,在Sqoop作业结束时,sqoop运行配置单元脚本将该数据加载到Hive中。

由于这理论上是从Hadoop集群中的任何节点运行的,因此需要在每个节点上提供hive CLI并与同一个Metastore进行通信。 Hive Metastore需要以远程模式运行。



最常见的问题是因为Sqoop无法与正确的Metastore对话。主要原因通常为:


  1. Hive Metastore服务未运行。它应该以远程模式运行,并应该启动一项单独的服务。这里有一个快速的方法来检查它的运行情况:


    服务配置单元 - metastore status



  2. hive-site.xml 不包含 hive.metastore.uris 。下面是一个例子 hive-site.xml hive.metastore.uris 集合:


     < configuration> 
    ...
    <属性>
    < name> hive.metastore.uris< / name>
    < value> thrift://sqoop2.example.com:9083< / value>
    < / property>
    ...
    < / configuration>



  3. hive-site。 xml 不包含在您的Sqoop操作(或其属性)中。尝试将您的hive-site.xml添加到您的Sqoop操作中的< file> 元素。下面是一个带有< file> 的workflow.xml示例:


     < workflow-app name =sqoop-to-hivexmlns =uri:oozie:workflow:0.4> 
    ...
    ...
    < sqoop xmlns =uri:oozie:sqoop-action:0.2>
    ...
    < file> /tmp/hive-site.xml#hive-site.xml< / file>
    < / sqoop>
    ...
    < / action>
    ...
    < / workflow-app>




I am facing issue while executing oozie sqoop action. In logs I can see that sqoop is able to import data to temp directory then sqoop creates hive scripts to import data.

It fails while importing temp data to hive.

In logs I am not getting any exception.

Below is a sqoop action I am using.

<workflow-app name="testSqoopLoadWorkflow" xmlns="uri:oozie:workflow:0.4">
<credentials>
    <credential name='hive_credentials' type='hcat'>
        <property>
            <name>hcat.metastore.uri</name>
            <value>${HIVE_THRIFT_URL}</value>
        </property>
        <property>
            <name>hcat.metastore.principal</name>
            <value>${KERBEROS_PRINCIPAL}</value>
        </property>
    </credential>
</credentials>
<start to="loadSqoopDataAction"/>
<action name="loadSqoopDataAction" cred="hive_credentials">
    <sqoop xmlns="uri:oozie:sqoop-action:0.2">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
             <job-xml>/tmp/hive-oozie-site.xml</job-xml>
        <configuration>
            <property>
                <name>oozie.hive.defaults</name>
                <value>/tmp/hive-oozie-site.xml</value>
            </property>
                  </configuration>
        <command>job --meta-connect ${SQOOP_METASTORE_URL} --exec TEST_SQOOP_LOAD_JOB</command>
    </sqoop>
    <ok to="end"/>
    <error to="kill"/>
</action>

Below is a sqoop Job I am using to import data.

sqoop job --meta-connect ${SQOOP_METASTORE_URL} --create TEST_SQOOP_LOAD_JOB -- import --connect '${JDBC_URL}' --table testTable -m 1 --append --check-column pkId --incremental append --hive-import --hive-table testHiveTable;

In mapred logs I am getting following exception.

72285 [main] INFO  org.apache.sqoop.hive.HiveImport  - Loading uploaded data into Hive
Intercepting System.exit(1)

<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

Oozie Launcher failed, finishing Hadoop job gracefully


Oozie Launcher ends

Please suggest.

解决方案

This seems like a typical Sqoop import to Hive job. So it seems like Sqoop has successfully imported data in HDFS and is failing to load that data into Hive.

Here's some background on what's happening... Oozie launches a separate job (which will execute on any node in your hadoop cluster) to run the Sqoop command. The Sqoop command starts a separate job to load data into HDFS. Then, at the end of the Sqoop job, sqoop runs a hive script to load that data into Hive.

Since this is theoretically running from any node in your Hadoop cluster, hive CLI will need to be available on each node and talk to the same metastore. The Hive Metastore will need to run in remote mode.

The most normal problem is because Sqoop cannot talk to the correct metastore. The main reasons for this are normally:

  1. Hive metastore service is not running. It should be running in remote mode and a separate service should be started. Here's a quick way to check if its running:

    service hive-metastore status

  2. hive-site.xml does not contain hive.metastore.uris. Here's an example hive-site.xml with hive.metastore.uris set:

    <configuration>
    ...
      <property>
        <name>hive.metastore.uris</name>
        <value>thrift://sqoop2.example.com:9083</value>
      </property>
    ...
    </configuration>
    

  3. hive-site.xml is not included in your Sqoop action (or its properties). Try adding your hive-site.xml to a <file> element in your Sqoop action. Here's an example workflow.xml with <file> in it:

    <workflow-app name="sqoop-to-hive" xmlns="uri:oozie:workflow:0.4">
        ...
        <action name="sqoop2hive">
            ...
            <sqoop xmlns="uri:oozie:sqoop-action:0.2">
                ...
                <file>/tmp/hive-site.xml#hive-site.xml</file>
            </sqoop>
            ...
        </action>
        ...
    </workflow-app>
    

这篇关于oozie Sqoop操作无法将数据导入配置单元的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆