如何在Dataproc群集启动时自动安装Python库? [英] How do I install Python libraries automatically on Dataproc cluster startup?

查看:148
本文介绍了如何在Dataproc群集启动时自动安装Python库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当群集启动时,如何在我的Dataproc群集上自动安装Python库?这将为我节省手动登录到主节点和/或工作节点以手动安装我需要的库的麻烦。



如果能够自动执行此自动化安装可能只在主服务器上安装 而不是工作者。

解决方案

初始化操作是实现此目的的最佳方法。初始化操作是在创建集群时运行的shell脚本。这可以让您自定义群集,例如安装Python库。这些脚本必须存储在Google Cloud Storage中,并可在通过Google Cloud SDK或Google Developers Console创建群集时使用。



以下是安装示例初始化操作在主节点上的Python pandas 集群创建仅限

 #!/ bin / sh 
ROLE = $(/ usr / share / google / get_metadata_value attributes / role)
if [[ ROLE}=='Master']];然后
apt-get install python-pandas -y
fi

可以从这个脚本中看到,可以通过 / usr / share / google / get_metadata_value attributes / role 来辨别节点的角色,然后在master或工作人员)节点。



您可以查看 Google Cloud Dataproc文档了解更多详情

How can I automatically install Python libraries on my Dataproc cluster when the cluster starts? This would save me the trouble of manually logging into the master and/or worker nodes to manually install the libraries I need.

It would be great to also know if this automated installation could install things only on the master and not the workers.

解决方案

Initialization actions are the best way to do this. Initialization actions are shell scripts which are run when the cluster is created. This will let you customize the cluster, such as installing Python libraries. These scripts must be stored in Google Cloud Storage and can be used when creating clusters via the Google Cloud SDK or the Google Developers Console.

Here is a sample initialization action to install the Python pandas on cluster creation only on the master node.

#!/bin/sh
ROLE=$(/usr/share/google/get_metadata_value attributes/role)
if [[ "${ROLE}" == 'Master' ]]; then 
  apt-get install python-pandas -y
fi

As you can see from this script, it is possible to discern the role of a node with /usr/share/google/get_metadata_value attributes/role and then perform action specifically on the master (or worker) node.

You can see the Google Cloud Dataproc Documentation for more details

这篇关于如何在Dataproc群集启动时自动安装Python库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆