如何通过Spark Thrift Server访问自定义UDF? [英] How to access custom UDFs through Spark Thrift Server?
问题描述
我正在EMR上运行Spark Thrift Server.我通过以下方式启动Spark Thrift服务器:
I am running Spark Thrift Server on EMR. I start up the Spark Thrift Server by:
sudo -u spark /usr/lib/spark/sbin/start-thriftserver.sh --queue interactive.thrift --jars /opt/lib/custom-udfs.jar
请注意,我有一个客户UDF jar,我想将其添加到Thrift Server类路径中,因此我在上述命令中添加了--jars/opt/lib/custom-udfs.jar.
Notice that I have a customer UDF jar and I want to add it to the Thrift Server classpath, so I added --jars /opt/lib/custom-udfs.jar in the above command.
一旦进入我的EMR,我将发出以下内容以连接到Spark Thrift Server.
Once I am in my EMR, I issued the following to connect to the Spark Thrift Server.
beeline -u jdbc:hive2://localhost:10000/default
然后,我能够发出显示数据库之类的命令.但是,如何访问自定义UDF?我以为通过在Thrift Server启动脚本中添加-jars 选项,也可以添加供Hive资源使用的jar.
Then I was able to issue command like show databases. But how do I access the custom UDF? I thought by adding the --jars option in the Thrift Server startup script, that would add the jar for Hive resource to use as well.
我现在可以访问自定义UDF的唯一方法是将客户UDF jar添加到Hive资源中
The only way I can access the custom UDF now is by adding the customer UDF jar to Hive resource
add jar /opt/lib/custom-udfs.jar
然后创建UDF的功能.
Then create function of the UDF.
问题: 有没有一种方法可以自动配置自定义UDF jar,而不必每次都在spark会话中添加jar?
Question: Is there a way to auto config the custom UDF jar without adding jar each time to the spark session?
谢谢!
推荐答案
最简单的方法是在最后编辑文件start-thriftserver.sh
:
The easiest way is to edit the file start-thriftserver.sh
, at the end:
- 等待服务器已准备就绪
- 执行安装程序SQL查询
您还可以在JIRA上发布提案,这是一项非常好的功能,在启动时执行设置代码".
You could also post a proposal on JIRA, this is a very good feature "Execute setup code at start up".
这篇关于如何通过Spark Thrift Server访问自定义UDF?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!