飞艇:斯卡拉数据框到Python [英] Zeppelin: Scala Dataframe to python

查看:537
本文介绍了飞艇:斯卡拉数据框到Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一个段落斯卡拉用数据帧,我可以分享和使用与蟒蛇。 (据我所知pyspark使用 py4j

If I have a Scala paragraph with a DataFrame, can I share and use that with python. (As I understand it pyspark uses py4j)

我试过这样:

斯卡拉段落:

x.printSchema
z.put("xtable", x )

Python的段落:

Python paragraph:

%pyspark

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

the_data = z.get("xtable")

print the_data

sns.set()
g = sns.PairGrid(data=the_data,
                 x_vars=dependent_var,
                 y_vars=sensor_measure_columns_names +  operational_settings_columns_names,
                 hue="UnitNumber", size=3, aspect=2.5)
g = g.map(plt.plot, alpha=0.5)
g = g.set(xlim=(300,0))
g = g.add_legend()

错误:

Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark.py", line 222, in <module>
    eval(compiledCode)
  File "<string>", line 15, in <module>
  File "/usr/local/lib/python2.7/dist-packages/seaborn/axisgrid.py", line 1223, in __init__
    hue_names = utils.categorical_order(data[hue], hue_order)
TypeError: 'JavaObject' object has no attribute '__getitem__'

解决方案:

%pyspark

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

import StringIO
def show(p):
    img = StringIO.StringIO()
    p.savefig(img, format='svg')
    img.seek(0)
    print "%html <div style='width:600px'>" + img.buf + "</div>"

df = sqlContext.table("fd").select()
df.printSchema
pdf = df.toPandas()

g = sns.pairplot(data=pdf,
                 x_vars=["setting1","setting2"],
                 y_vars=["s4", "s3", 
                         "s9", "s8", 
                         "s13", "s6"],
                 hue="id", aspect=2)
show(g)   

集群可视化

推荐答案

您可以注册数据帧在Scala的一个临时表:

You can register DataFrame as a temporary table in Scala:

df.registerTempTable("df")

和在Python与读 SQLContext.table

and read it in Python with SQLContext.table:

df = sqlContext.table("df")

如果你真的想使用 / GET 你必须构建Python 数据帧从头

If you really want to use put / get you'll have build Python DataFrame from scratch:

z.put("df", df: org.apache.spark.sql.DataFrame)

from pyspark.sql import DataFrame

df = DataFrame(z.get("df"), sqlContext)

要与 matplotlib 绘制你必须转换数据帧与任何<$ C $本地Python对象C>收集或 toPandas

To plot with matplotlib you'll have convert DataFrame to a local Python object with either collect or toPandas:

pdf = df.toPandas()

请注意,将数据取到驱动程序。

Please note that it will fetch data to the driver.

这篇关于飞艇:斯卡拉数据框到Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆