如何在Google Dataproc集群中安装python软件包 [英] How to install python packages in a Google Dataproc cluster

查看:71
本文介绍了如何在Google Dataproc集群中安装python软件包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

创建并运行python包后,是否可以在Google Dataproc集群中安装python软件包?

Is it possible to install python packages in a Google Dataproc cluster after the cluster is created and running?

我试图在主命令行中使用"pip install xxxxxxx",但是它似乎不起作用.

I tried to use "pip install xxxxxxx" in the master command line but it does not seem to work.

Google的Dataproc文档没有提及这种情况.

Google's Dataproc documentation does not mention this situation.

推荐答案

创建集群后通常无法实现.我建议使用初始化操作来做到这一点.

This is generally not possible after cluster is created. I recommend using an initialization action to do this.

您已经注意到,pip在默认情况下也不可用.因此,您需要先运行easy_install pip,然后再运行pip install命令.

As you've noticed, pip is also not available by default. So you'll want to run easy_install pip followed by pip install command.

最后,如果您打算以任何自动化方式使用此群集,并且/或者希望具有密封性,建议您创建一个wheel,并将其存储在GCS中并在init操作中下载.然后,您将安装车轮.与直接从pip上安装许多软件包相比,Wheel具有更多的好处.

Finally, if your intention is to use this cluster in any automation, and/or you want hermeticness, I recommend creating a wheel that you store in GCS and download in init action. You'd then install your wheel. Wheels have added benefit of being faster than installing many packages from pip directly.

2019更新

请参阅本教程,了解如何在Dataproc上配置Python环境: https://cloud.google.com/dataproc/docs/tutorials/python-configuration

See this tutorial on how to configure Python environment on Dataproc: https://cloud.google.com/dataproc/docs/tutorials/python-configuration

这篇关于如何在Google Dataproc集群中安装python软件包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆