在 google Colaboratory 上工作时如何打开 Spark UI? [英] How to open Spark UI when working on google Colaboratory?
问题描述
如何通过 Spark WEB UI 监控作业的进度?如果我运行本地模式,我可以使用本地 PC 上的端口 4040 访问 Spark UI.我只是使用 http://localhost:4040.
How can I monitor the progress of a job through the Spark WEB UI? I can access Spark UI using the port 4040 on my local PC if I am running local mode. I just use http://localhost:4040.
推荐答案
按照这个 colab notebook 您可以执行以下操作.
Following this colab notebook you can do the following.
首先,配置 Spark UI 并启动 Spark 会话:
First, configure the Spark UI and start a Spark session:
import findspark
findspark.init()
from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
conf = SparkConf().set('spark.ui.port', '4050')
sc = SparkContext(conf=conf)
spark = SparkSession.builder.master('local[*]').getOrCreate()
在下一个单元格中运行:
In the next cell run:
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip
get_ipython().system_raw('./ngrok http 4050 &')
这将安装 ngrok
并创建一个 URL,您可以通过该 URL 访问 Spark UI(等待 10 秒以启动).
which will install ngrok
and create a URL through which you can access the Spark UI (wait 10sec for it to start).
现在,要访问 URL,请调用:
Now, to access the URL, call:
!curl -s http://localhost:4040/api/tunnels
它打印出一个看起来像这样的 JSON(被截断):
which prints out a JSON that looks something like this (truncated):
{"tunnels":[{"name":"command_line","uri":"/api/tunnels/command_line","public_url":"https://1b881e94406c.ngrok.io","proto":"https", ... }
--您正在寻找上面的这个 "public_url"
,这是您的 Spark UI 的 URL.
-- you're looking for the this "public_url"
above, that's your Spark UI's URL.
或者,运行这个:
!curl -s http://localhost:4040/api/tunnels | python3 -c "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
我已经测试过了,它对我有用.
I've tested it and it works for me.
这篇关于在 google Colaboratory 上工作时如何打开 Spark UI?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!