Hadoop：如何在Python MapReduce中包含第三方库 [英] Hadoop: How to include third party library in Python MapReduce

查看：495 发布时间：2018/5/31 20:09:49 python hadoop mapreduce

本文介绍了Hadoop：如何在Python MapReduce中包含第三方库的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在用Python编写MapReduce作业，并希望使用一些第三个库，如 chardet 。

I am writing MapReduce job in Python, and want to use some third libraries like chardet.

I konw，我们可以使用选项 -libjars = ... 将它们包含在Java MapReduce中。

I konw that we can use option -libjars=... to include them for java MapReduce.

但是如何在Python MapReduce Job中包含第三方库？

But how to include third party libraries in Python MapReduce Job ?

谢谢！

Thank you!

推荐答案

>问题已经通过 zipimport 。

然后我压缩 chardet 到文件 module.mod ，并像这样使用：

Then I zip chardet to file module.mod, and used like this:

importer = zipimport.zipimporter('module.mod') chardet = importer.load_module('chardet')

在hadoop streaming命令中添加 -file module.mod 。

Add -file module.mod in hadoop streaming command.

现在 chardet 可用于脚本中。

更多信息显示在： ^ h ow我可以包含一个包含Hadoop流式作业的Python包吗？

这篇关于Hadoop：如何在Python MapReduce中包含第三方库的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hadoop：如何在Python MapReduce中包含第三方库 [英] Hadoop: How to include third party library in Python MapReduce

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Hadoop：如何在Python MapReduce中包含第三方库 [英] Hadoop: How to include third party library in Python MapReduce

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭