是否可以从Scala(spark)调用python函数 [英] Is it possible to call a python function from Scala(spark)

查看:161
本文介绍了是否可以从Scala(spark)调用python函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建一个Spark作业,该作业要求使用python编写的函数将一列添加到数据框.其余处理使用Scala完成.

I am creating a spark job that requires a column to be added to a dataframe using a function written in python. The rest of the processing is done using Scala.

我找到了如何从pyspark调用Java/Scala函数的示例:

I have found examples of how to call a Java/Scala function from pyspark:

我发现以其他方式发送数据的唯一示例是使用 pipe

The only examples I have found to send data the other way is using pipe

我是否可以将整个数据帧发送给python函数,让该函数操纵数据并添加其他列,然后将结果数据帧发送回调用的Scala函数?

Is it possible for me to send the entire dataframe to a python function, have the function manipulate the data and add additional columns and then send the resulting dataframe back to the calling Scala function?

如果这不可能,我当前的解决方案是运行pyspark进程并调用多个Scala函数来操纵数据框,这不是理想的选择.

If this isn't possible my current solution is to run a pyspark process and call multiple Scala functions to manipulate the dataframe, this isn't ideal.

推荐答案

只需从Python注册一个UDF,然后从Scala评估一个对DataFrame使用该函数的SQL语句-就像一个超级按钮一样,就尝试了;) https://github.com/jupyter/docker-stacks/tree/master/all-spark-notebook 是在Toree中运行笔记本的好方法,该笔记本混合了Scala和Python代码,并调用了相同的Spark上下文.

Just register a UDF from Python, and then from Scala evaluate an SQL statement that uses the function against a DataFrame - works like a charm, just tried it ;) https://github.com/jupyter/docker-stacks/tree/master/all-spark-notebook is a good way to run a notebook in Toree that mixes Scala and Python code calling the same Spark context.

这篇关于是否可以从Scala(spark)调用python函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆