使用 Zeppelin 将 Spark DataFrame 从 Python 迁移到 Scala [英] Moving Spark DataFrame from Python to Scala whithn Zeppelin

查看：32 发布时间：2021/11/14 23:50:59 python scala apache-spark apache-spark-sql apache-zeppelin

本文介绍了使用 Zeppelin 将 Spark DataFrame 从 Python 迁移到 Scala的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我在 Zeppelin 的 Python 段落中创建了一个 spark DataFrame.

I created a spark DataFrame in a Python paragraph in Zeppelin.

sqlCtx = SQLContext(sc)
spDf = sqlCtx.createDataFrame(df)

和 df 是一个 Pandas 数据框

and df is a pandas dataframe

print(type(df))
<class 'pandas.core.frame.DataFrame'>

我想要做的是将 spDf 从一个 Python 段落移动到另一个 Scala 段落.看起来合理的做法是使用 z.put.

what I want to do is moving spDf from one Python paragraph to another Scala paragraph. It look a reasonable way to do is using z.put.

z.put("spDf", spDf)

我收到了这个错误:

AttributeError: 'DataFrame' object has no attribute '_get_object_id'

有什么修复错误的建议吗?或者有什么建议可以移动 spDf?

Any suggestion to fix the error? Or any suggestion to move spDf?

您可以放置内部 Java 对象而不是 Python 包装器:

You canput internal Java object not a Python wrapper:

%pyspark

df = sc.parallelize([(1, "foo"), (2, "bar")]).toDF(["k", "v"])
z.put("df", df._jdf)

然后确保您使用正确的类型:

and then make sure you use correct type:

val df = z.get("df").asInstanceOf[org.apache.spark.sql.DataFrame]
// df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]

但最好注册临时表:

%pyspark

# registerTempTable in Spark 1.x
df.createTempView("df")

and use SQLContext.table to read it:

// sqlContext.table in Spark 1.x
val df = spark.table("df")

df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]

要向相反方向转换，请参阅Zeppelin:Scala Dataframe to python

To convert in the opposite direction see Zeppelin: Scala Dataframe to python

这篇关于使用 Zeppelin 将 Spark DataFrame 从 Python 迁移到 Scala的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文