使用Luigi python运行Hadoop jar [英] Running Hadoop jar using Luigi python

查看:262
本文介绍了使用Luigi python运行Hadoop jar的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用Python中的 Luigi 来运行Hadoop jar作业。我搜索并找到了在Luigi中编写mapper和reducer的例子,但没有直接运行Hadoop jar的例子。

我需要运行直接编译的Hadoop jar。我该怎么做?

解决方案

您需要使用 luigi.contrib.hadoop_jar 包( code )。

尤其需要扩展 HadoopJarJobTask 。例如,如下所示:

from luigi.contrib.hadoop_jar import HadoopJarJobTask
from luigi.contrib.hdfs.target import HdfsTarget
$ b $ class TextExtractorTask(HadoopJarJobTask):
def output(self):
return HdfsTarget('data / processed /')

def jar(self):
return'jobfile.jar'

def main(self):
return'com.ololo.HadoopJob'

def args(self):
return ['--param1','1','--param2','2']

您还可以在工作流中使用maven构建一个jar文件:

从luigi.file导入HdfsTarget 
导入LocalTarget

导入子流程
导入os

类BuildJobTask(luigi.Ta sk):
def输出(self):
返回LocalTarget('target / jobfile.jar')

def run(self):
subprocess.call ['mvn','clean','package','-DskipTests'])

class YourHadoopTask(HadoopJarJobTask):
def output(self):
return HdfsTarget 'data / processed /')

def jar(self):
return self.input()。fn

def main(self):
返回'com.ololo.HadoopJob'

def args(self):
return ['--param1','1','--param2','2']

def需要(个体):
返回BuildJobTask()


I need to run a Hadoop jar job using Luigi from python. I searched and found examples of writing mapper and reducer in Luigi but nothing to directly run a Hadoop jar.

I need to run a Hadoop jar compiled directly. How can I do it?

解决方案

You need to use the luigi.contrib.hadoop_jar package (code).

In particular, you need to extend HadoopJarJobTask. For example, like that:

from luigi.contrib.hadoop_jar import HadoopJarJobTask
from luigi.contrib.hdfs.target import HdfsTarget

class TextExtractorTask(HadoopJarJobTask):
    def output(self):
        return HdfsTarget('data/processed/')

    def jar(self):
        return 'jobfile.jar'

    def main(self):
        return 'com.ololo.HadoopJob'

    def args(self):
        return ['--param1', '1', '--param2', '2']

You can also include building a jar file with maven to the workflow:

import luigi
from luigi.contrib.hadoop_jar import HadoopJarJobTask
from luigi.contrib.hdfs.target import HdfsTarget
from luigi.file import LocalTarget

import subprocess
import os

class BuildJobTask(luigi.Task):
    def output(self):
        return LocalTarget('target/jobfile.jar')

    def run(self):
        subprocess.call(['mvn', 'clean', 'package', '-DskipTests'])

class YourHadoopTask(HadoopJarJobTask):
    def output(self):
        return HdfsTarget('data/processed/')

    def jar(self):
        return self.input().fn

    def main(self):
        return 'com.ololo.HadoopJob'

    def args(self):
        return ['--param1', '1', '--param2', '2']

    def requires(self):
        return BuildJobTask()

这篇关于使用Luigi python运行Hadoop jar的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆