在Spark 2解释器下将Python与Zeppelin结合使用 [英] Using Python with Zeppelin under the Spark 2 Interpreter

查看:154
本文介绍了在Spark 2解释器下将Python与Zeppelin结合使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在虚拟机上部署了HDP:2.6.4

I have deployed HDP: 2.6.4 on a virtual machine

我可以看到spark2没有指向正确的python文件夹.我的问题是

I can see that the spark2 is not pointing to the correct python folder. My questions are

1)如何找到我的python所在的位置?

1) How can I find where my python is located?

解决方案:输入 whereis python ,您将获得它所在位置的列表

solution: Type whereis python and you will get a list of where it is

2)如何更新现有的python库并将新的库添加到该文件夹​​?例如,等效于CLI上的'pip install numpy'.

2) How can I update the existing python libraries and add new libraries to that folder ? For example, the equivalent of 'pip install numpy' on CLI.

  • 目前还不清楚

3)如何使Zeppelin Spark2指向包含我可以更新的python文件夹的特定目录?-在Zeppelin上,有一个小的编辑"按钮,我可以将其更改为包含python的目录的路径.

3) How can I make Zeppelin Spark2 point at that specific directory that contains the python folder that I can update? - On Zeppelin, there is a little 'edit' button that I can change the path to the directory that contains python.

解决方案:转到Zeppelin中的解释器,找到spark2,然后使zeppelin.pyspark.python指向python已经存在的位置.

solution: go to the interpreter in zeppelin, find spark2, and make zeppelin.pyspark.python point to where python is already there.

现在,如果您需要python 3.4+,则需要执行一系列不同的步骤,首先要将python 3.4.+放入HDP沙箱中.

Now if you need python 3.4+ there is a whole set of different steps you have to do, to first get python 3.4.+ into the HDP sandbox.

谢谢

推荐答案

对于像您这样的沙盒环境,沙盒映像是在Linux OS(CentOS)上制作的.Zeppelin Notebook很可能会指向每个Linux操作系统随附的Python安装.如果您希望自己安装Python和像SciPy堆栈中的数据库那样自己的用于数据分析的库集.您需要在虚拟机上安装Anaconda.您的VM需要连接到互联网,以便您可以下载并安装Anaconda软件包以进行测试.

For a Sandbox environment like yours, a sandbox image is made on a Linux OS (CentOS). The Zeppelin Notebook points, in all probability, to the Python installation that comes along with every Linux OS. If you wish to have your own installation of Python and your own set of libraries for Data Analysis like those in the SciPy stack. You need to install Anaconda on your Virtual machine. Your VM eed to be connected to the internet so that you can download and install the Anaconda package for testing.

然后,您可以将Zeppelin指向anaconda的目录,直到以下路径:/home/user/anaconda3/bin/python ,其中user是您的用户名

You can then point Zeppelin to the anaconda's directory till the following path : /home/user/anaconda3/bin/python where user is your username

Zeppelin配置也证实了以下事实:它使用/usr/bin/python 的默认python安装.您可以浏览其文档以获取更多信息

Zeppelin Configuration also confirms the fact that it uses the default python installation at /usr/bin/python. You can go through its documentation for more Information

更新

Joseph,Spark Installations,默认情况下使用Python解释器和已安装在操作系统上的python库.您显示的文件夹结构仅告诉您PySpark模块的位置.该模块是一个类似于Pandas ior NumPy的库.

Hi Joseph, Spark Installations, by default, use the Python interpreter and the python libraries that have been installed on your OS. The folder structure that you have shown only tell you the location of the PySpark module. This module is a library like Pandas ior NumPy.

您可以做的是通过命令 pip install package name 安装SciPy Stack [NumPy,Pandas,MatplotLib等.],然后将这些库直接导入Zeppelin Notebook.

What you can do is install the SciPy Stack[NumPy, Pandas, MatplotLib etc..] via the command pip install package name and import those libraries directly into your Zeppelin Notebook.

在snadbox的终端中使用命令 whereis python ,结果将为您提供以下内容/usr/bin/python/usr/bin/python2.7 ....

Use the command whereis python in the terminal of your snadbox, the result would give you something as follows /usr/bin/python /usr/bin/python2.7 ....

在Zeppelin配置中,对于属性 zeppelin.pyspark.python ,您可以设置上一个命令的输出中的第一个值,即/usr/bin/python .因此,现在您通过 pip install 命令安装的所有库都可以在zeppelin中使用.

In your Zeppelin Configuration, for the property zeppelin.pyspark.python you can set the first value from the out put of the previous command i.e /usr/bin/python. So now all the libraries you installed via the pip install command would be available for you in zeppelin.

此过程仅适用于您的沙盒环境.在实际的生产集群中,您的管理员需要在Spark集群的所有节点上安装所有这些库.

This process would only work for your Sandbox environment. In a real production cluster, your administrator needs to install all these libraries on all the nodes of your Spark cluster.

这篇关于在Spark 2解释器下将Python与Zeppelin结合使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆