我们如何在SQOOP中自动增量导入? [英] How can we automate incremental import in SQOOP?

查看:244
本文介绍了我们如何在SQOOP中自动增量导入?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们如何自动化SQoop中的增量导入?

在增量导入中,我们需要给 - 最后一个值开始从最后一个值导入但是我的工作是经常从RDBMS导入,我不想手动给出最后一个值,有什么办法可以使这个过程自动化吗?

解决方案

另一种方法@Durga Viswanath Gadiraju答案。



如果您要将数据导入配置单元表,您可以查询最后一个更新来自配置单元表的值并将值传递给sqoop导入查询。
您可以使用shell脚本或oozie操作来实现此功能。



Shell脚本:

  lastupdatedvalue =`hive -e'从表中选择last_value #tweak基于逻辑的选择查询。 

sqoop import --connect jdbc:mysql:// localhost:3306 / ydb --table yloc --username root -P --incremental append --last-value $ {lastupdatedvalue}

Oozie方式


  1. 基于逻辑的选择查询的Hive操作检索
    上次更新的值。


  2. $ b

    PFB a

  3. Sqoop动作,用于从捕获的以前配置单元操作的输出中增加负载。 sudo workflow:

     < workflow-app name =sqoop-to-hivexmlns =uri:oozie:workflow: 0.4\" > 
    < start to =hiveact/>
    < action name =hiveact>
    < hive xmlns =uri:oozie:hive-action:0.2>
    < job-tracker> $ {jobTracker}< / job-tracker>
    < name-node> $ {nameNode}< / name-node>
    <配置>
    <属性>
    < name> mapred.job.queue.name< / name>
    <值> $ {queueName}< /值>
    < / property>
    < / configuration>
    < script> script.sql< / script>
    < capture-output />
    < / hive>
    < error to =kill/>

    < sqoop xmlns =uri:oozie:sqoop-action:0.2>
    < job-tracker> $ {jobTracker}< / job-tracker>
    < name-node> $ {nameNode}< / name-node>
    < command> import --connect jdbc:mysql:// localhost:3306 / ydb --table yloc --username root -P --incremental append --last-value $ {wf:actionData('hiveact )}< /命令>
    < / sqoop>
    < error to =kill/>
    < / action>
    < kill name =kill>
    < message>操作失败< / message>
    < / kill>
    < end name =end/>



    希望这有帮助。 p>

    How can we automate the incremental import in SQoop ?

    In incremental import, we need to give the --last-value to start the import from the last value onwards, but my job is to frequently import from RDBMS, I don't want to give last value manually, is there any way we can automate this process?

    解决方案

    An alternate approach to @Durga Viswanath Gadiraju answer.

    In case you are importing the data to a hive table , you could query the last updated value from the hive table and pass the value to the sqoop import query. You could use shell script or oozie actions for achieving this.

    Shell script :

    lastupdatedvalue=`hive -e 'select last_value from table` #tweak the selection query based on the logic.
    
    sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --incremental append --last-value ${lastupdatedvalue}
    

    Oozie approach :

    1. Hive action for the select query based on the logic to retrieve the last updated value .
    2. Sqoop action for incremental load from the captured output of previous hive action.

    PFB a sudo workflow :

    <workflow-app name="sqoop-to-hive" xmlns="uri:oozie:workflow:0.4">
    <start to="hiveact"/>
    <action name="hiveact">
        <hive xmlns="uri:oozie:hive-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <script>script.sql</script>
    <capture-output/>
        </hive>    
        <ok to="sqoopact"/>
        <error to="kill"/>
    
    <action name="sqoopact">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <command>import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --incremental append --last-value ${wf:actionData('hiveact')}</command>
         </sqoop>
        <ok to="end"/>
        <error to="kill"/>
    </action>
    <kill name="kill">
        <message>Action failed</message>
    </kill>
    <end name="end"/>
    

    Hope this helps.

    这篇关于我们如何在SQOOP中自动增量导入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆