我想我的所有现有UDTFs转换成蜂巢Scala的功能和使用它从SQL星火 [英] I want to convert all my existing UDTFs in Hive to Scala functions and use it from Spark SQL
问题描述
任何一个可以给我一个例子UDTF(例如,爆炸)用Scala编写返回多行和SparkSQL使用它作为UDF?
表:表1
+ ------ + ---------- + ---------- +
|用户id | someString |瓦拉|
+ ------ + ---------- + ---------- +
| 1 |例1 | [0,2,5] |
| 2 |示例2 | [1,20,5] |
+ ------ + ---------- + ---------- +
我想创建以下斯卡拉code:
高清exampleUDTF(VAR:序号[INT])=<返回类型???> {
// code爆炸现场的Vara?
}sqlContext.udf.register(exampleUDTF,exampleUDTF _)sqlContext.sql(FROM表1 SELECT用户ID someString,exampleUDTF(翻))。收()。的foreach(的println)
期望的输出:
+ ------ + ---------- + ---- +
|用户id | someString |翻|
+ ------ + ---------- + ---- +
| 1 |例1 | 0 |
| 1 |例1 | 2 |
| 1 |例1 | 5 |
| 2 |例2 | 1 |
| 2 |例2 | 20 |
| 2 |例2 | 5 |
+ ------ + ---------- + ---- +
蜂巢表:
名ID
[Subhajit森,Binoy蒙达尔,山塔努·杜塔] 15
[Gobinathan SP,苛刻古普塔,拉胡尔·阿南德] 16
- 创建Scala的功能:
DEF TOUPPER(名称:序列[字符串])。=(name.map(A => a.toUpperCase))toSeq
<醇开始=2>sqlContext.udf.register(TOUPPER,TOUPPER _)
<醇开始=3>VAR DF = sqlContext.sql(SELECT TOUPPER(name)根据名称列表)。toDF(姓名)
<醇开始=4>df.explode(DF(姓名)){案件org.apache.spark.sql.Row(ARR:序号[字符串])=> arr.toSeq.map(V => Tuple1(V))} .drop(DF(姓名))。withColumnRenamed(_ 1,名)。显示
结果:
+ -------------- +
|名称|
+ -------------- +
| SUBHAJIT SEN |
| BINOY蒙达尔|
|山塔努DUTTA |
| GOBINATHAN SP |
| HARSH古普塔|
| RAHUL ANAND |
+ -------------- +
Can any one give me an example UDTF (eg; explode) written in scala which returns multiple row and use it as UDF in SparkSQL?
Table: table1
+------+----------+----------+
|userId|someString| varA|
+------+----------+----------+
| 1| example1| [0, 2, 5]|
| 2| example2|[1, 20, 5]|
+------+----------+----------+
I'd like to create the following Scala code:
def exampleUDTF(var: Seq[Int]) = <Return Type???> {
// code to explode varA field ???
}
sqlContext.udf.register("exampleUDTF",exampleUDTF _)
sqlContext.sql("FROM table1 SELECT userId, someString, exampleUDTF(varA)").collect().foreach(println)
Expected output:
+------+----------+----+
|userId|someString|varA|
+------+----------+----+
| 1| example1| 0|
| 1| example1| 2|
| 1| example1| 5|
| 2| example2| 1|
| 2| example2| 20|
| 2| example2| 5|
+------+----------+----+
Hive Table:
name id
["Subhajit Sen","Binoy Mondal","Shantanu Dutta"] 15
["Gobinathan SP","Harsh Gupta","Rahul Anand"] 16
- Creating a scala function :
def toUpper(name: Seq[String]) = (name.map(a => a.toUpperCase)).toSeq
- Registering function as UDF :
sqlContext.udf.register("toUpper",toUpper _)
- Calling the UDF using sqlContext and storing output as DataFrame object :
var df = sqlContext.sql("SELECT toUpper(name) FROM namelist").toDF("Name")
- Exploding the DataFrame :
df.explode(df("Name")){case org.apache.spark.sql.Row(arr: Seq[String]) => arr.toSeq.map(v => Tuple1(v))}.drop(df("Name")).withColumnRenamed("_1","Name").show
Result:
+--------------+
| Name|
+--------------+
| SUBHAJIT SEN|
| BINOY MONDAL|
|SHANTANU DUTTA|
| GOBINATHAN SP|
| HARSH GUPTA|
| RAHUL ANAND|
+--------------+
这篇关于我想我的所有现有UDTFs转换成蜂巢Scala的功能和使用它从SQL星火的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!