如何通过Spark Thrift Server访问自定义UDF? [英] How to access custom UDFs through Spark Thrift Server?

查看:634
本文介绍了如何通过Spark Thrift Server访问自定义UDF?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在EMR上运行Spark Thrift Server.我通过以下方式启动Spark Thrift服务器:

I am running Spark Thrift Server on EMR. I start up the Spark Thrift Server by:

  sudo -u spark /usr/lib/spark/sbin/start-thriftserver.sh --queue interactive.thrift --jars /opt/lib/custom-udfs.jar

请注意,我有一个客户UDF jar,我想将其添加到Thrift Server类路径中,因此我在上述命令中添加了--jars/opt/lib/custom-udfs.jar.

Notice that I have a customer UDF jar and I want to add it to the Thrift Server classpath, so I added --jars /opt/lib/custom-udfs.jar in the above command.

一旦进入我的EMR,我将发出以下内容以连接到Spark Thrift Server.

Once I am in my EMR, I issued the following to connect to the Spark Thrift Server.

beeline -u jdbc:hive2://localhost:10000/default

然后,我能够发出显示数据库之类的命令.但是,如何访问自定义UDF?我以为通过在Thrift Server启动脚本中添加-jars 选项,也可以添加供Hive资源使用的jar.

Then I was able to issue command like show databases. But how do I access the custom UDF? I thought by adding the --jars option in the Thrift Server startup script, that would add the jar for Hive resource to use as well.

我现在可以访问自定义UDF的唯一方法是将客户UDF jar添加到Hive资源中

The only way I can access the custom UDF now is by adding the customer UDF jar to Hive resource

add jar /opt/lib/custom-udfs.jar

然后创建UDF的功能.

Then create function of the UDF.

问题: 有没有一种方法可以自动配置自定义UDF jar,而不必每次都在spark会话中添加jar?

Question: Is there a way to auto config the custom UDF jar without adding jar each time to the spark session?

谢谢!

推荐答案

最简单的方法是在最后编辑文件start-thriftserver.sh:

The easiest way is to edit the file start-thriftserver.sh, at the end:

  1. 等待服务器已准备就绪
  2. 执行安装程序SQL查询

您还可以在JIRA上发布提案,这是一项非常好的功能,在启动时执行设置代码".

You could also post a proposal on JIRA, this is a very good feature "Execute setup code at start up".

这篇关于如何通过Spark Thrift Server访问自定义UDF?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆