如何使用云外壳中的Jupyter笔记本连接到Dataproc群集 [英] How do I connect to a dataproc cluster with Jupyter notebooks from cloud shell

查看:263
本文介绍了如何使用云外壳中的Jupyter笔记本连接到Dataproc群集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里看到了 https://cloud.google.com / dataproc / docs / tutorials / jupyter-notebook 用于使用dataproc设置Jupyter笔记本电脑,但我无法弄清楚如何改变进程以便使用云壳而不是在本地创建SSH隧道。我可以通过运行

  datalab connect vmname 
来连接到datalab笔记本。

从云壳然后使用预览功能。我想要做类似的事情,但使用Jupyter笔记本和数据集群。

已找到说明 https://cloud.google.com/shell/docs/features#web_preview 通过云端Shell的相同网络预览功能使用本地端口转发访问Dataproc上的Jupyter笔记本。类似于您的云壳中的以下内容:

  gcloud compute ssh my-cluster-m  -  -L 8080:my-cluster -m:8123 

然而,有两个问题妨碍了它的工作:


  1. 您需要修改Jupyter配置以将以下内容添加到 /root/.jupyter/jupyter_notebook_config.py

      c.NotebookApp.allow_origin ='*'

  2. Cloud Shell的web预览需要添加对websockets的支持。



  3. 如果您不这样做(1),那么当您尝试创建笔记本时,由于Jupyter拒绝云外壳代理域,您会看到弹出错误。不幸的是(2)需要Cloud Shell本身的更深入的支持;它会显示为之类的错误。无法建立与笔记本服务器的连接。



    另一个可能的选项等待(2)是在Dataproc集群上运行自己的nginx代理作为jupyter初始化操作的一部分,如果你可以适当地使用它代理websocket。看到这个线程的类似情况: https://github.com/jupyter/notebook/issues/ 1311



    通常,这种类型的代理层中断开的websocket支持是一个常见问题,因为它还比较新;随着时间的推移,越来越多的事情将开始支持websocket开箱即用。



    或者:



    Dataproc还支持使用Datalab初始化操作;这被设置为使得代理的websocket已经被处理。因此,如果您不是特别依赖Jupyter,那么以下方法适用于云外壳:

      gcloud数据集群集群创建my-datalab-cluster \ 
    --initialization-actions gs://dataproc-initialization-actions/datalab/datalab.sh
    gcloud compute ssh my-datalab-cluster -m - -L 8080 :my-datalab-cluster -m:8080

    然后在端口上选择通常的Web Preview或者您可以选择其他云端Shell支持的端口作为本地绑定,如:

      gcloud compute ssh my-datalab-cluster- m--L 8082:my-datalab-cluster -m:8080 

    在这种情况下, d选择 8082 作为网络预览端口。


    I have seen the instructions here https://cloud.google.com/dataproc/docs/tutorials/jupyter-notebook for setting up Jupyter notebooks with dataproc but I can't figure out how to alter the process in order to use Cloud shell instead of creating an SSH tunnel locally. I have been able to connect to a datalab notebook by running

    datalab connect vmname 
    

    from the cloud shell and then using the preview function. I would like to do something similar but with Jupyter notebooks and a dataproc cluster.

    解决方案

    In theory, you can mostly follow the same instructions as found https://cloud.google.com/shell/docs/features#web_preview to use local port forwarding to access your Jupyter notebooks on Dataproc via the Cloud Shell's same "web preview" feature. Something like the following in your cloud shell:

    gcloud compute ssh my-cluster-m -- -L 8080:my-cluster-m:8123
    

    However, there are two issues which prevent this from working:

    1. You need to modify the Jupyter config to add the following to the bottom of /root/.jupyter/jupyter_notebook_config.py:

      c.NotebookApp.allow_origin = '*'
      

    2. Cloud Shell's web preview needs to add support for websockets.

    If you don't do (1) then you'll get popup errors when trying to create a notebook, due to Jupyter refusing the cloud shell proxy domain. Unfortunately (2) requires deeper support from Cloud Shell itself; it'll manifest as errors like A connection to the notebook server could not be established.

    Another possible option without waiting for (2) is to run your own nginx proxy as part of the jupyter initialization action on a Dataproc cluster, if you can get it to proxy websockets suitably. See this thread for a similar situation: https://github.com/jupyter/notebook/issues/1311

    Generally this type of broken websocket support in proxy layers is a common problem since it's still relatively new; over time more and more things will start to support websockets out of the box.

    Alternatively:

    Dataproc also supports using a Datalab initialization action; this is set up such that the websockets proxying is already taken care of. Thus, if you're not too dependent on just Jupyter specifically, then the following works in cloud shell:

    gcloud dataproc clusters create my-datalab-cluster \
        --initialization-actions gs://dataproc-initialization-actions/datalab/datalab.sh
    gcloud compute ssh my-datalab-cluster-m -- -L 8080:my-datalab-cluster-m:8080
    

    And then select the usual "Web Preview" on port 8080. Or you can select other Cloud Shell supported ports for the local binding like:

    gcloud compute ssh my-datalab-cluster-m -- -L 8082:my-datalab-cluster-m:8080
    

    In which case you'd select 8082 as the web preview port.

    这篇关于如何使用云外壳中的Jupyter笔记本连接到Dataproc群集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆