Zeppelin将Spark DataFrame从Python移动到Scala [英] Moving Spark DataFrame from Python to Scala whithn Zeppelin
问题描述
我在Zeppelin的Python段落中创建了spark DataFrame.
I created a spark DataFrame in a Python paragraph in Zeppelin.
sqlCtx = SQLContext(sc)
spDf = sqlCtx.createDataFrame(df)
和df
是熊猫数据框
print(type(df))
<class 'pandas.core.frame.DataFrame'>
我想要做的是将spDf
从一个Python段落移动到另一个Scala段落.看来合理的方法是使用z.put
.
what I want to do is moving spDf
from one Python paragraph to another Scala paragraph. It look a reasonable way to do is using z.put
.
z.put("spDf", spDf)
我收到此错误:
AttributeError: 'DataFrame' object has no attribute '_get_object_id'
是否有解决错误的建议?或有任何建议移动spDf
?
Any suggestion to fix the error? Or any suggestion to move spDf
?
推荐答案
您可以put
内部Java对象而不是Python包装器:
You canput
internal Java object not a Python wrapper:
%pyspark
df = sc.parallelize([(1, "foo"), (2, "bar")]).toDF(["k", "v"])
z.put("df", df._jdf)
,然后确保使用正确的类型:
and then make sure you use correct type:
val df = z.get("df").asInstanceOf[org.apache.spark.sql.DataFrame]
// df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]
但是最好注册临时表:
%pyspark
# registerTempTable in Spark 1.x
df.createTempView("df")
and use SQLContext.table
to read it:
// sqlContext.table in Spark 1.x
val df = spark.table("df")
df: org.apache.spark.sql.DataFrame = [k: bigint, v: string]
要以相反的方向进行转换,请参见 Zeppelin:Scala Dataframe转换为python
To convert in the opposite direction see Zeppelin: Scala Dataframe to python
这篇关于Zeppelin将Spark DataFrame从Python移动到Scala的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!