Hadoop全局变量与流媒体 [英] Hadoop global variable with streaming

查看：104 发布时间：2018/5/31 20:02:01 hadoop hadoop-streaming

本文介绍了Hadoop全局变量与流媒体的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

但我该如何使用Hadoop Streaming（在我的Python中）来完成这项工作？我知道我可以通过Job和Configuration为映射器提供一些全局值。案例）？

什么是正确的方法？基于基于在文档上，您可以指定一个命令行选项（ -cmdenv name = value ）在每台分布式机器上设置环境变量，然后您可以在映射器/缩减器中使用它们：

<$ p $ $ HADOOP_HOME / bin / hadoop jar $ HADOOP_HOME / hadoop-streaming.jar \ -input input.txt \ -output output.txt \ -mapper mapper.py \ -reducer reducer.py \ -file mapper.py \ -file reducer.py \ -cmdenv MY_PARAM = thing_I_need

I understand that i can give some global value to my mappers via the Job and the Configuration.

But how can i do that using Hadoop Streaming(Python in my case)?

What is the right way?
解决方案
Based on the docs you can specify a command line option (-cmdenv name=value) to set environment variables on each distributed machine that you can then use in your mappers/reducers:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \ -input input.txt \ -output output.txt \ -mapper mapper.py \ -reducer reducer.py \ -file mapper.py \ -file reducer.py \ -cmdenv MY_PARAM=thing_I_need

这篇关于Hadoop全局变量与流媒体的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hadoop全局变量与流媒体 [英] Hadoop global variable with streaming

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Hadoop全局变量与流媒体 [英] Hadoop global variable with streaming

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭