如何使用python> 2.6.6在Cloud Enterprise集群上的BigInsights上具有火花吗? [英] How can I use python > 2.6.6 with spark on BigInsights on cloud Enterprise clusters?

查看:102
本文介绍了如何使用python> 2.6.6在Cloud Enterprise集群上的BigInsights上具有火花吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

带有BigInsights的python版本当前为2.6.6.我该如何在纱线上运行我的Spark作业时使用不同版本的Python?

The version of python with BigInsights is currently 2.6.6. How can I use a different version of Python with my spark jobs running on yarn?

请注意,BigInsights在云上的用户没有root访问权限.

Note that users of BigInsights on cloud do not have root access.

推荐答案

安装Anaconda

此脚本在Cloud 4.2 Enterprise集群上的BigInsights上安装anaconda python. 请注意,这些说明不适用于基本群集,因为您只能登录到Shell节点,而不能登录任何其他节点.

This script installs anaconda python on a BigInsights on cloud 4.2 Enterprise cluster. Note that these instructions do NOT work for Basic clusters because you are only able to login to a shell node and not any other nodes.

Ssh进入mastermanager节点,然后运行(更改您的环境的值):

Ssh into the mastermanager node, then run (changing the values for your environment):

export BI_USER=snowch
export BI_PASS=changeme
export BI_HOST=bi-hadoop-prod-4118.bi.services.us-south.bluemix.net

下一步运行以下命令.该脚本尝试尽可能地实现幂等性,因此,如果多次运行该脚本都没关系:

Next run the following. The script attempts to be as idemopotent as possible so it shouldn't matter if you run it multiple times:

# abort if the script encounters an error or undeclared variables
set -euo

CLUSTER_NAME=$(curl -s -k -u $BI_USER:$BI_PASS  -X GET https://${BI_HOST}:9443/api/v1/clusters | python -c 'import sys, json; print(json.load(sys.stdin)["items"][0]["Clusters"]["cluster_name"]);')
echo Cluster Name: $CLUSTER_NAME

CLUSTER_HOSTS=$(curl -s -k -u $BI_USER:$BI_PASS  -X GET https://${BI_HOST}:9443/api/v1/clusters/${CLUSTER_NAME}/hosts | python -c 'import sys, json; items = json.load(sys.stdin)["items"]; hosts = [ item["Hosts"]["host_name"] for item in items ]; print(" ".join(hosts));')
echo Cluster Hosts: $CLUSTER_HOSTS

wget -c https://repo.continuum.io/archive/Anaconda2-4.1.1-Linux-x86_64.sh

# Install anaconda if it isn't already installed
[[ -d anaconda2 ]] || bash Anaconda2-4.1.1-Linux-x86_64.sh -b

# You can install your pip modules using something like this:
# ${HOME}/anaconda2/bin/python -c 'import yourlibrary' || ${HOME}/anaconda2/pip install yourlibrary

# Install anaconda on all of the cluster nodes
for CLUSTER_HOST in ${CLUSTER_HOSTS}; 
do 
   if [[ "$CLUSTER_HOST" != "$BI_HOST" ]];
   then
      echo "*** Processing $CLUSTER_HOST ***"
      ssh $BI_USER@$CLUSTER_HOST "wget -q -c https://repo.continuum.io/archive/Anaconda2-4.1.1-Linux-x86_64.sh"
      ssh $BI_USER@$CLUSTER_HOST "[[ -d anaconda2 ]] || bash Anaconda2-4.1.1-Linux-x86_64.sh -b"

      # You can install your pip modules on each node using something like this:
      # ssh $BI_USER@$CLUSTER_HOST "${HOME}/anaconda2/bin/python -c 'import yourlibrary' || ${HOME}/anaconda2/pip install yourlibrary"

      # Set the PYSPARK_PYTHON path on all of the nodes
      ssh $BI_USER@$CLUSTER_HOST "grep '^export PYSPARK_PYTHON=' ~/.bash_profile || echo export PYSPARK_PYTHON=${HOME}/anaconda2/bin/python2.7 >> ~/.bash_profile"
      ssh $BI_USER@$CLUSTER_HOST "sed -i -e 's;^export PYSPARK_PYTHON=.*$;export PYSPARK_PYTHON=${HOME}/anaconda2/bin/python2.7;g' ~/.bash_profile"
      ssh $BI_USER@$CLUSTER_HOST "cat ~/.bash_profile"
   fi
done

echo 'Finished installing'

运行pyspark作业

如果使用的是pyspark,则可以使用anaconda python,在运行pyspark命令之前设置以下变量:

If you are using pyspark, you can use anaconda python, set the following variables before running the pyspark command:

export SPARK_HOME=/usr/iop/current/spark-client
export HADOOP_CONF_DIR=/usr/iop/current/hadoop-client/conf

# set these to the folders where you installed anaconda
export PYSPARK_PYTHON=/home/biadmin/anaconda2/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/home/biadmin/anaconda2/bin/python2.7

spark-submit --master yarn --deploy-mode client ...

# NOTE: --deploy-mode cluster does not seem to use the PYSPARK_PYTHON setting
...

策帕林(可选)

如果您正在使用Zeppelin(

If you are using Zeppelin (as per these instructions for BigInsights on cloud), set the following variables in zeppelin_env.sh:

# set these to the folders where you installed anaconda
export PYSPARK_PYTHON=/home/biadmin/anaconda2/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/home/biadmin/anaconda2/bin/python2.7

这篇关于如何使用python> 2.6.6在Cloud Enterprise集群上的BigInsights上具有火花吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆