%matplotlib内联魔术命令无法从AWS-EMR Jupyterhub Notebook中的先前单元读取变量 [英] %matplotlib inline magic command fails to read variables from previous cells in AWS-EMR Jupyterhub Notebook

查看:196
本文介绍了%matplotlib内联魔术命令无法从AWS-EMR Jupyterhub Notebook中的先前单元读取变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在将其转换为AWS EMR jupyterhub中的pandas数据框后,我试图使用matplotlib绘制火花数据集.

I'm trying to plot spark dataset using matplotlib after converting it to pandas dataframe in AWS EMR jupyterhub.

我可以使用matplotlib在单个单元格中进行绘制,如下所示:

I'm able to plot in a single cell using matplotlib like below:

%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

df = [1, 1.6, 3, 4.2, 5, 4, 2.5, 3, 1.5]
plt.plot(df)

现在,上面的代码段对我来说非常整洁.

在此示例示例之后,我继续从AWS-EMR Jupyterhub中的新单元/多个单元中绘制熊猫数据框,如下所示:

After this sample example, I moved ahead to plot my pandas dataframe from a new/multiple cells in AWS-EMR Jupyterhub like this:

-Cell 1-
sparkDS=spark.read.parquet('s3://bucket_name/path').cache()


-Cell 2-
from pyspark.sql.functions import *
sparkDS_groupBy=sparkDS.groupBy('col1').agg(count('*').alias('count')).orderBy('col1')
pandasDF=sparkDS_groupBy.toPandas()


-cell 3-
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

plt.plot(pandasDF)

我的代码仅在单元格3中失败,并显示以下错误:

My code just fails in cell 3 with the following error:

NameError:未定义名称"pandasDF"

NameError: name 'pandasDF' is not defined

有人知道怎么了吗?

为什么我的jupyterhub笔记本中的新单元格无法识别前一个单元格中的变量?

Why the new cell in my jupyterhub notebook is not able to recognize a variable from the previous cell?

它是否必须使用'%matplotlib inline'魔术命令(我也尝试过'%matplotlib notebook',但失败了)?

Does it have to do something with the '%matplotlib inline' magic command (I tried with '%matplotlib notebook' also, but failed)?

ps:我正在使用AWS 5.19 EMR-Jupyterhub笔记本设置进行绘图工作.

ps: I'm using AWS 5.19 EMR-Jupyterhub notebook setup for my plotting work.

此错误有点类似于此错误,但并非重复 如何使matplotlib工作在AWS EMR Jupyter笔记本中?

推荐答案

您需要通过在单元格中键入%%help来研究%%spark -o df_name%%local函数.

You'll want to look into the %%spark -o df_name and %%local functions, by typing %%help in a cell.

具体来说,请尝试以下操作:

Specifically, in your case try:

  1. -Cell 2-的开头使用%%spark -o sparkDS_groupBy
  2. %%local开始-Cell 3-
  3. 然后在-Cell 3-中而不是pandasDF中绘制sparkDS_groupBy.
  1. Use %%spark -o sparkDS_groupBy at the start of -Cell 2-,
  2. Start -Cell 3- with %%local,
  3. And plot sparkDS_groupBy in -Cell 3- instead of pandasDF.


对于上下文较少的用户,可以通过使用PySpark内核在EMR Notebook中实现以下内容来获得图表,该内核附加到至少5.26.0版的EMR集群(引入了(每个代码块代表一个单元格)

(each code block represents a Cell)

%% help

%%configure -f
{ "conf":{
"spark.pyspark.python": "python3",
"spark.pyspark.virtualenv.enabled": "true",
"spark.pyspark.virtualenv.type":"native",
"spark.pyspark.virtualenv.bin.path":"/usr/bin/virtualenv"
}}

sc.install_pypi_package("matplotlib")

%%spark -o my_df
# in this cell, my_df is a pyspark.sql.DataFrame
my_df = sc.read.text("s3://.../...")

%%local
%matplotlib inline

import matplotlib.pyplot as plt
# in this cell, my_df is a pandas.DataFrame
plt.plot(my_df)

这篇关于%matplotlib内联魔术命令无法从AWS-EMR Jupyterhub Notebook中的先前单元读取变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆