Zeppelin将Spark DataFrame从Python移动到Scala [英] Moving Spark DataFrame from Python to Scala whithn Zeppelin

查看：69 发布时间：2020/9/4 6:08:44 python scala apache-spark apache-spark-sql apache-zeppelin

本文介绍了Zeppelin将Spark DataFrame从Python移动到Scala的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我在Zeppelin的Python段落中创建了spark DataFrame.

I created a spark DataFrame in a Python paragraph in Zeppelin.

sqlCtx = SQLContext(sc)
spDf = sqlCtx.createDataFrame(df)

和df是熊猫数据框

print(type(df))
<class 'pandas.core.frame.DataFrame'>

我想要做的是将spDf从一个Python段落移动到另一个Scala段落.看来合理的方法是使用z.put.

what I want to do is moving spDf from one Python paragraph to another Scala paragraph. It look a reasonable way to do is using z.put.

z.put("spDf", spDf)

我收到此错误:

AttributeError: 'DataFrame' object has no attribute '_get_object_id'

是否有解决错误的建议?或有任何建议移动spDf?

Any suggestion to fix the error? Or any suggestion to move spDf?

您可以put内部Java对象而不是Python包装器:

You canput internal Java object not a Python wrapper:

%pyspark

df = sc.parallelize([(1, "foo"), (2, "bar")]).toDF(["k", "v"])
z.put("df", df._jdf)

，然后确保使用正确的类型:

and then make sure you use correct type:

val df = z.get("df").asInstanceOf[org.apache.spark.sql.DataFrame]
// df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]

但是最好注册临时表:

%pyspark

# registerTempTable in Spark 1.x
df.createTempView("df")

并使用

and use SQLContext.table to read it:

// sqlContext.table in Spark 1.x
val df = spark.table("df")

df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]

To convert in the opposite direction see Zeppelin: Scala Dataframe to python

这篇关于Zeppelin将Spark DataFrame从Python移动到Scala的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文