无法使用Spark SQL中的现有Hive永久性UDF [英] Unable to use an existing Hive permanent UDF from Spark SQL

查看：2248 发布时间：2018/6/12 13:50:06 apache-spark hive apache-spark-sql udf

本文介绍了无法使用Spark SQL中的现有Hive永久性UDF的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我之前已经使用配置单元注册了UDF。它是永久的，不是 TEMPORARY 。它可以直线运行。

  CREATE FUNCTION normaliseURL AS'com.example.hive.udfs.NormaliseURL'USING JAR'hdfs：/ /udfs/hive-udfs.jar';

我的spark已配置为使用配置单元Metastore。配置工作，因为我可以查询配置单元表。我可以看到UDF;
$ b $ pre $ 在[9]：spark.sql（'describe function normaliseURL'）。show（truncate =假） + ------------------------------------------- + | function_desc | + ------------------------------------------- + |功能：default.normaliseURL | | Class：com.example.hive.udfs.NormaliseURL | |用法：N / A。 | + ------------------------------------------- +

但是我不能在sql语句中使用UDF;

  spark.sql（'SELECT normaliseURL（value）'）
 AnalysisException：未定义的函数：'default.normaliseURL'此函数既不是注册的临时函数也不是在数据库'default'中注册的永久函数;第1行7

如果我尝试注册UDF与火花（绕过Metastore）它注册失败，表明它已经存在。

  In [12]：spark.sql（create function normaliseURL as'com.example.hive.udfs.NormaliseURL'）
 AnalysisException：函数'default.normaliseURL'已经存在于数据库'default';'

我使用Spark 2.0，hive metastore 1.1.0。 UDF是scala，我的spark驱动代码是python。

我很难过。

我是否正确地认为Spark可以使用Metastore定义的永久UDF？

我是否在蜂巢中正确创建函数？
<问题是Spark 2.0无法执行其JAR位于HDFS上的函数。

解决方案

解决方案

b $ b

Spark SQL：Thriftserver无法运行注册的Hive UDTF

一种解决方法是将函数定义为Spark作业中的临时函数，其中jar路径指向本地边缘节点路径。然后在相同的Spark作业中调用该函数。

  CREATE TEMPORARY FUNCTION functionName as'com.test.HiveUDF'USING JAR'/ user /home/dir1/functions.jar'

I have previously registered a UDF with hive. It is permanent not TEMPORARY. It works in beeline.

CREATE FUNCTION normaliseURL AS 'com.example.hive.udfs.NormaliseURL' USING JAR 'hdfs://udfs/hive-udfs.jar';

I have spark configured to use the hive metastore. The config is working as I can query hive tables. I can see the UDF;

In [9]: spark.sql('describe function normaliseURL').show(truncate=False)
+-------------------------------------------+
|function_desc                              |
+-------------------------------------------+
|Function: default.normaliseURL             |
|Class: com.example.hive.udfs.NormaliseURL  |
|Usage: N/A.                                |
+-------------------------------------------+

However I cannot use the UDF in a sql statement;

spark.sql('SELECT normaliseURL("value")')
AnalysisException: "Undefined function: 'default.normaliseURL'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7"

If I attempt to register the UDF with spark (bypassing the metastore) it fails to register it, suggesting that it does already exist.

In [12]: spark.sql("create function normaliseURL as 'com.example.hive.udfs.NormaliseURL'")
AnalysisException: "Function 'default.normaliseURL' already exists in database 'default';"

I'm using Spark 2.0, hive metastore 1.1.0. The UDF is scala, my spark driver code is python.

I'm stumped.

Am I correct in my assumption that Spark can utilise metastore-defined permanent UDFs?

Am I creating the function correctly in hive?

解决方案
Issue is Spark 2.0 is not able to execute the functions whose JARs are located on HDFS.

Spark SQL: Thriftserver unable to run a registered Hive UDTF

One workaround is to define the function as a temporary function in Spark job with jar path pointing to a local edge-node path. Then call the function in same Spark job.
CREATE TEMPORARY FUNCTION functionName as 'com.test.HiveUDF' USING JAR '/user/home/dir1/functions.jar'

这篇关于无法使用Spark SQL中的现有Hive永久性UDF的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

无法使用Spark SQL中的现有Hive永久性UDF [英] Unable to use an existing Hive permanent UDF from Spark SQL

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

无法使用Spark SQL中的现有Hive永久性UDF [英] Unable to use an existing Hive permanent UDF from Spark SQL

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭