如何在Java/Scala中的SparkSQL中将Python函数注册为UDF? [英] How to register Python function as UDF in SparkSQL in Java/Scala?

查看:303
本文介绍了如何在Java/Scala中的SparkSQL中将Python函数注册为UDF?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Python中没有几个非常非常简单的函数,我想将它们用作Spark SQL中的UDF.从Python注册和使用它们似乎很容易.但是我想在使用JavaSQLContext或SQLContext时从Java/Scala中使用它们.我注意到在spark 1.2.1中有一个函数

I have few very, very simple functions in Python that I would like to use as UDFs in Spark SQL. It seems easy to register and use them from Python. But I would like to use them from Java/Scala when using JavaSQLContext or SQLContext. I noted that in spark 1.2.1 there is function registerPython but it is neither clear to me how to use it nor whether I should ...

关于如何执行此操作的任何想法?我认为在1.3.0中可能会更容易,但仅限于1.2.1.

Any ideas on how to to do this? I think that it might got easier in 1.3.0 but I'm limited to 1.2.1.

由于不再进行此操作,我很想知道如何在任何 Spark版本中做到这一点.

As no longer working on this, I'm interest in knowing how to do this in any Spark version.

推荐答案

鉴于Spark UDF的最新实现(利用Jython调用您的Python函数.

Given that the latest implementation of Spark UDFs (2.3.1 documentation) doesn't include any python UDF registration functionality (scala and Java only), I'd recommend leveraging Jython to call your Python functions.

您将能够使用调用Jython的方法来定义Java类来运行python函数,然后将这些Java方法注册为SQL上下文中的UDF.尽管这比直接将python代码注册为UDF更为round回,但它的好处是符合当前模式并具有更可维护的上下文切换.

You'll be able to define a Java class with methods calling Jython to run your python functions, then register those Java methods as UDFs within your SQL context. While this is more roundabout than directly registering python code as a UDF, it has the benefit of complying with current patterns and having a more maintainable context switch.

这篇关于如何在Java/Scala中的SparkSQL中将Python函数注册为UDF?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆