在 Spark 2 解释器下使用 Python 和 Zeppelin [英] Using Python with Zeppelin under the Spark 2 Interpreter

查看:50
本文介绍了在 Spark 2 解释器下使用 Python 和 Zeppelin的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在虚拟机上部署了 HDP: 2.6.4

I have deployed HDP: 2.6.4 on a virtual machine

我可以看到 spark2 没有指向正确的 python 文件夹.我的问题是

I can see that the spark2 is not pointing to the correct python folder. My questions are

1) 如何找到我的 python 所在的位置?

1) How can I find where my python is located?

解决方案:输入whereis python,你会得到它所在位置的列表

solution: Type whereis python and you will get a list of where it is

2) 如何更新现有的 python 库并将新库添加到该文件夹​​?例如,相当于 CLI 上的pip install numpy".

2) How can I update the existing python libraries and add new libraries to that folder ? For example, the equivalent of 'pip install numpy' on CLI.

  • 尚不清楚

3) 如何让 Zeppelin Spark2 指向包含我可以更新的 python 文件夹的特定目录?- 在 Zeppelin 上,有一个小的编辑"按钮,我可以更改包含 python 的目录的路径.

3) How can I make Zeppelin Spark2 point at that specific directory that contains the python folder that I can update? - On Zeppelin, there is a little 'edit' button that I can change the path to the directory that contains python.

解决方案:去zeppelin中的解释器,找到spark2,让zeppelin.pyspark.python指向python已经存在的地方.

solution: go to the interpreter in zeppelin, find spark2, and make zeppelin.pyspark.python point to where python is already there.

现在,如果您需要 python 3.4+,您必须执行一整套不同的步骤,首先将 python 3.4.+ 放入 HDP 沙箱.

Now if you need python 3.4+ there is a whole set of different steps you have to do, to first get python 3.4.+ into the HDP sandbox.

谢谢,

推荐答案

对于像您这样的沙盒环境,沙盒映像是在 Linux 操作系统 (CentOS) 上制作的.Zeppelin Notebook 很可能指向每个 Linux 操作系统附带的 Python 安装.如果您希望拥有自己的 Python 安装和您自己的数据分析库集,如 SciPy 堆栈中的库.您需要在虚拟机上安装 Anaconda.您的 VM 需要连接到 Internet,以便您可以下载并安装 Anaconda 包进行测试.

For a Sandbox environment like yours, a sandbox image is made on a Linux OS (CentOS). The Zeppelin Notebook points, in all probability, to the Python installation that comes along with every Linux OS. If you wish to have your own installation of Python and your own set of libraries for Data Analysis like those in the SciPy stack. You need to install Anaconda on your Virtual machine. Your VM eed to be connected to the internet so that you can download and install the Anaconda package for testing.

然后您可以将 Zeppelin 指向 anaconda 的目录,直到以下路径:/home/user/anaconda3/bin/python 其中 user 是您的用户名

You can then point Zeppelin to the anaconda's directory till the following path : /home/user/anaconda3/bin/python where user is your username

Zeppelin Configuration 也证实了这样一个事实:它使用 /usr/bin/python 中的默认 python 安装.您可以查看其文档以获取更多信息

Zeppelin Configuration also confirms the fact that it uses the default python installation at /usr/bin/python. You can go through its documentation for more Information

更新

您好 Joseph,Spark 安装默认使用 Python 解释器和已安装在您的操作系统上的 Python 库.您显示的文件夹结构仅告诉您 PySpark 模块的位置.这个模块是一个类似于 Pandas ior NumPy 的库.

Hi Joseph, Spark Installations, by default, use the Python interpreter and the python libraries that have been installed on your OS. The folder structure that you have shown only tell you the location of the PySpark module. This module is a library like Pandas ior NumPy.

您可以做的是通过命令 pip install package name 安装 SciPy Stack[NumPy、Pandas、MatplotLib 等..] 并将这些库直接导入您的 Zeppelin Notebook.

What you can do is install the SciPy Stack[NumPy, Pandas, MatplotLib etc..] via the command pip install package name and import those libraries directly into your Zeppelin Notebook.

在你的snadbox终端使用whereis python命令,结果如下/usr/bin/python/usr/bin/python2.7 ....

Use the command whereis python in the terminal of your snadbox, the result would give you something as follows /usr/bin/python /usr/bin/python2.7 ....

在您的 Zeppelin 配置中,对于属性 zeppelin.pyspark.python,您可以设置上一个命令的输出中的第一个值,即 /usr/bin/python.因此,现在您通过 pip install 命令安装的所有库都可以在 zeppelin 中使用.

In your Zeppelin Configuration, for the property zeppelin.pyspark.python you can set the first value from the out put of the previous command i.e /usr/bin/python. So now all the libraries you installed via the pip install command would be available for you in zeppelin.

此过程仅适用于您的沙盒环境.在实际的生产集群中,您的管理员需要在 Spark 集群的所有节点上安装所有这些库.

This process would only work for your Sandbox environment. In a real production cluster, your administrator needs to install all these libraries on all the nodes of your Spark cluster.

这篇关于在 Spark 2 解释器下使用 Python 和 Zeppelin的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆