我想我的所有现有UDTFs转换成蜂巢Scala的功能和使用它从SQL星火 [英] I want to convert all my existing UDTFs in Hive to Scala functions and use it from Spark SQL

查看：214 发布时间：2016/5/22 15:48:54 scala hadoop apache-spark hive apache-spark-sql

本文介绍了我想我的所有现有UDTFs转换成蜂巢Scala的功能和使用它从SQL星火的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

任何一个可以给我一个例子UDTF（例如，爆炸）用Scala编写返回多行和SparkSQL使用它作为UDF？

表：表1

  + ------ + ---------- + ---------- +
|用户id | someString |瓦拉|
+ ------ + ---------- + ---------- +
| 1 |例1 | [0，2，5] |
| 2 |示例2 | [1，20，5] |
+ ------ + ---------- + ---------- +

我想创建以下斯卡拉code：

 高清exampleUDTF（VAR：序号[INT]）=＆LT;返回类型???＆GT; {
  // code爆炸现场的Vara？
}sqlContext.udf.register（exampleUDTF，exampleUDTF _）sqlContext.sql（FROM表1 SELECT用户ID someString，exampleUDTF（翻））。收（）。的foreach（的println）

期望的输出：

  + ------ + ---------- + ---- +
|用户id | someString |翻|
+ ------ + ---------- + ---- +
| 1 |例1 | 0 |
| 1 |例1 | 2 |
| 1 |例1 | 5 |
| 2 |例2 | 1 |
| 2 |例2 | 20 |
| 2 |例2 | 5 |
+ ------ + ---------- + ---- +

解决方案

蜂巢表：

 名ID
[Subhajit森，Binoy蒙达尔，山塔努·杜塔] 15
[Gobinathan SP，苛刻古普塔，拉胡尔·阿南德] 16

创建Scala的功能：

DEF TOUPPER（名称：序列[字符串]）。=（name.map（A => a.toUpperCase））toSeq

<醇开始=2>

注册功能UDF：

sqlContext.udf.register（TOUPPER，TOUPPER _）

<醇开始=3>

电话使用sqlContext和存储输出数据框对象UDF：

VAR DF = sqlContext.sql（SELECT TOUPPER（name）根据名称列表）。toDF（姓名）

<醇开始=4>

爆炸数据框：

df.explode（DF（姓名））{案件org.apache.spark.sql.Row（ARR：序号[字符串]）=> arr.toSeq.map（V => Tuple1（V））} .drop（DF（姓名））。withColumnRenamed（_ 1，名）。显示

结果：

  + -------------- +
|名称|
+ -------------- +
| SUBHAJIT SEN |
| BINOY蒙达尔|
|山塔努DUTTA |
| GOBINATHAN SP |
| HARSH古普塔|
| RAHUL ANAND |
+ -------------- +

Can any one give me an example UDTF (eg; explode) written in scala which returns multiple row and use it as UDF in SparkSQL?

Table: table1

+------+----------+----------+
|userId|someString|      varA|
+------+----------+----------+
|     1|  example1| [0, 2, 5]|
|     2|  example2|[1, 20, 5]|
+------+----------+----------+

I'd like to create the following Scala code:

def exampleUDTF(var: Seq[Int]) = <Return Type???>  {
  // code to explode varA field ???
}

sqlContext.udf.register("exampleUDTF",exampleUDTF _)

sqlContext.sql("FROM table1 SELECT userId, someString, exampleUDTF(varA)").collect().foreach(println)

Expected output:

+------+----------+----+
|userId|someString|varA|
+------+----------+----+
|     1|  example1|   0|
|     1|  example1|   2|
|     1|  example1|   5|
|     2|  example2|   1|
|     2|  example2|  20|
|     2|  example2|   5|
+------+----------+----+

解决方案

Hive Table:

name    id
["Subhajit Sen","Binoy Mondal","Shantanu Dutta"]        15
["Gobinathan SP","Harsh Gupta","Rahul Anand"]   16

Creating a scala function :

def toUpper(name: Seq[String]) = (name.map(a => a.toUpperCase)).toSeq

Registering function as UDF :

sqlContext.udf.register("toUpper",toUpper _)

Calling the UDF using sqlContext and storing output as DataFrame object :

var df = sqlContext.sql("SELECT toUpper(name) FROM namelist").toDF("Name")

Exploding the DataFrame :

df.explode(df("Name")){case org.apache.spark.sql.Row(arr: Seq[String]) => arr.toSeq.map(v => Tuple1(v))}.drop(df("Name")).withColumnRenamed("_1","Name").show

Result:

+--------------+
|          Name|
+--------------+
|  SUBHAJIT SEN|
|  BINOY MONDAL|
|SHANTANU DUTTA|
| GOBINATHAN SP|
|   HARSH GUPTA|
|   RAHUL ANAND|
+--------------+

这篇关于我想我的所有现有UDTFs转换成蜂巢Scala的功能和使用它从SQL星火的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我想我的所有现有UDTFs转换成蜂巢Scala的功能和使用它从SQL星火 [英] I want to convert all my existing UDTFs in Hive to Scala functions and use it from Spark SQL

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我想我的所有现有UDTFs转换成蜂巢Scala的功能和使用它从SQL星火 [英] I want to convert all my existing UDTFs in Hive to Scala functions and use it from Spark SQL

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭