关于Oozie / Sqoop的问题 [英] Questions about Oozie/Sqoop

查看:111
本文介绍了关于Oozie / Sqoop的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个问题:

  1。为什么Sqoop中的MapReduce进程将数据从HDFS加载到MySQL? 

例如

数据在HDFS中目录:
/ foo / bar



为了加载MySQL栏表中的数据, MapReduce进程?



sqoop export --connect jdbc:mysql:// localhost / hduser --table foo -m 1 --export-dir / foo / bar



输入以上命令后,执行MapReduce过程。

  2。我如何使用Sqoop / Oozie在MySQL中启用/禁用密钥? 

由于大量数据被加载到MySQL,我们需要使用启用/禁用。我如何实现它?

  3。如何并行运行多个Oozie作业? 

4.如何在Cron中运行Oozie作业?

您可以回答1个或多个问题。

谢谢。

解决方案

我会一一浏览您的问题。随意在评论中提出更多问题,我将详细说明您不清楚的事情。

1。为什么Sqoop中的MapReduce进程会将数据从HDFS加载到MySQL?



这是因为Sqoop基于MapReduce。如果考虑文件如何存储在HDFS中,它们将被拆分成小块,并且这些块将存储在整个群集中(某些块可能位于同一个节点上)。因此,在Map任务中并行读取所有这些数据块并将它们写入MySQL时,有一个MapReduce作业是非常合理的。

2 。我如何使用Sqoop / Oozie在MySQL中启用/禁用密钥?



我不知道这个答案。不过,我觉得你的问题有点模棱两可。请尝试添加更多的细节和如果我找到一些东西,我会回到这里。

3。如何并行运行多个Oozie作业?



每个Oozie作业都由 workflow.xml job.properties


  • 如果您正在讨论手动执行多个Oozie工作流程(作业),您可以通过简单地运行命令为所有想要并行运行的作业启动Oozie作业。示例命令: oozie job -config job.properties -run


  • 并行的Oozie工作流程中的多个操作,可以有一个分支来并行触发多个操作,然后是并行动作在完成时相遇的连接点。示例:

     < fork name ='sampleFork'> 
    < path start ='sampleAction1'/>
    < path start ='sampleAction2'/>
    < / fork>

    < error to ='fail'/>
    < / action>

    < join name ='joinActions'to'seqAction3'/>




4。如何在Cron中运行Oozie作业?



如果您想自动执行Oozie作业,我建议您查看Oozie协调器。使用oozie协调员,您可以安排工作流在每个间隔(10分钟,1小时,1天等)后触发。


I have few questions:

1. Why is there MapReduce process in Sqoop to load data from HDFS to MySQL? 

e.g.

Data is in HDFS on directory: /foo/bar

To load data in MySQL bar table, why is there a MapReduce process?

sqoop export --connect jdbc:mysql://localhost/hduser --table foo -m 1 --export-dir /foo/bar

After entering above command, MapReduce process executes.

2. How can I enable/disable key in MySQL using Sqoop/Oozie?

Since huge data is getting loaded to MySQL, we need to use enable/disable. How do I achieve it?

3. How to run multiple Oozie jobs in parallel? 

4. How to run Oozie jobs in Cron?

You can answer 1 or more questions.

Thank you.

解决方案

I'll go through your questions one by one. Feel free to ask more questions in the comments and I will elaborate on the things that are unclear to you.

1. Why is there MapReduce process in Sqoop to load data from HDFS to MySQL?

This is because Sqoop is based on MapReduce. If you consider how files are stored in HDFS, they are split into small chunks and these chunks are stored across the cluster (some of the chunks might be on the same node). So it makes perfect sense to have a MapReduce job where the Map tasks read all these chunks of data in parallel and write them to MySQL.

2. How can I enable/disable key in MySQL using Sqoop/Oozie?

I don't know the answer to this one. However I feel that your question is a little ambiguous. Please try adding some more details & If I find something I'll get back on this.

3. How to run multiple Oozie jobs in parallel?

Each Oozie job is defined by a workflow.xml and a job.properties.

  • If you're talking about manual execution of multiple Oozie workflows (jobs), you can do this by simply running the command to start Oozie jobs for all the jobs you want to run in parallel. Sample command: oozie job -config job.properties -run

  • If you're talking about running multiple actions within an Oozie workflow in parallel, you can have a fork to trigger off multiple actions in parallel & then a join point for the parallel actions to meet upon completion. Example:

    <fork name = 'sampleFork'>
       <path start = 'sampleAction1'/>
       <path start = 'sampleAction2'/>
    </fork>
    
    <action name = 'sampleAction`>
      ..
      ..
      ..
      <ok to = 'joinActions'/>
      <error to = 'fail'/>
    </action>
    
    <join name = 'joinActions' to 'seqAction3'/>
    

4. How to run Oozie jobs in Cron?

If you want to automate execution of Oozie jobs, I suggest you look into Oozie coordinator. Using oozie coordinator, you can schedule workflows to trigger off after every interval (10 mins, 1 hour, 1 day etc. ).

这篇关于关于Oozie / Sqoop的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆