使用Hadoop流管理依赖关系? [英] Managing dependencies with Hadoop Streaming?

查看：194 发布时间：2020/11/22 3:00:38 python hadoop mapreduce hadoop-streaming

本文介绍了使用Hadoop流管理依赖关系?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个快速的Hadoop Streaming问题.如果我使用的是Python流，并且我的映射器/缩减器需要Python包，但默认情况下未安装它们，那么我是否也需要在所有Hadoop机器上安装它们，或者是否存在某种序列化将其发送到远程机器?

I have a quick Hadoop Streaming question. If I'm using Python streaming and I have Python packages that my mappers/reducers require but aren't installed by default do I need to install those on all the Hadoop machines as well or is there some sort of serialization that sends them to the remote machines?

推荐答案

如果未在任务栏中安装它们，则可以使用-file将其发送.如果需要包或其他目录结构，则可以发送一个zip文件，该文件将为您解压缩.这是一个Haddop 0.17调用:

If they're not installed on your task boxes, you can send them with -file. If you need a package or other directory structure, you can send a zipfile, which will be unpacked for you. Here's a Haddop 0.17 invocation:

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-0.17.0-streaming.jar -mapper mapper.py -reducer reducer.py -input input/foo -output output -file /tmp/foo.py -file /tmp/lib.zip

但是，请注意以下问题:

However, see this issue for a caveat:

https://issues.apache.org/jira/browse/MAPREDUCE-596

这篇关于使用Hadoop流管理依赖关系?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Hadoop流管理依赖关系? [英] Managing dependencies with Hadoop Streaming?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Hadoop流管理依赖关系? [英] Managing dependencies with Hadoop Streaming?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭