如何将参数传递给python Hadoop串流作业? [英] How do I pass a parameter to a python Hadoop streaming job?

查看:156
本文介绍了如何将参数传递给python Hadoop串流作业?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于python Hadoop流式作业,我如何将一个参数传递给reducer脚本,以便根据传入的参数使其行为不同?



我了解到流式作业的格式为:



hadoop jar hadoop-streaming.jar - input -output -mapper mapper.py -reducer reducer.py ...



我想影响reducer.py。

解决方案

命令行选项 -reducer 的参数可以是任何命令,因此您可以尝试:

  $ HADOOP_HOME / bin / hadoop jar $ HADOOP_HOME / hadoop-streaming.jar \ 
-input inputDirs \
-output outputDir \
-mapper myMapper.py \
-reducer'myReducer.py 1 2 3'\
-file myMapper.py \
-file myReducer .py

假设 myReducer.py 可执行文件。免责声明:我没有尝试过,但之前我已经将类似的复杂字符串传递给 -mapper -reducer



也就是说,您是否试过了

  -cmdenv name = value 

选项,只需让您的Python Reducer从环境中获得它的价值?这只是另一种做事的方式。


For a python Hadoop streaming job, how do I pass a parameter to, for example, the reducer script so that it behaves different based on the parameter being passed in?

I understand that streaming jobs are called in the format of:

hadoop jar hadoop-streaming.jar -input -output -mapper mapper.py -reducer reducer.py ...

I want to affect reducer.py.

解决方案

The argument to the command line option -reducer can be any command, so you can try:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -input inputDirs \
    -output outputDir \
    -mapper myMapper.py \
    -reducer 'myReducer.py 1 2 3' \
    -file myMapper.py \
    -file myReducer.py

assuming myReducer.py is made executable. Disclaimer: I have not tried it, but I have passed similar complex strings to -mapper and -reducer before.

That said, have you tried the

-cmdenv name=value

option, and just have your Python reducer get its value from the environment? It's just another way to do things.

这篇关于如何将参数传递给python Hadoop串流作业?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆