我想我的所有现有UDTFs转换成蜂巢Scala的功能和使用它从SQL星火 [英] I want to convert all my existing UDTFs in Hive to Scala functions and use it from Spark SQL

查看:214
本文介绍了我想我的所有现有UDTFs转换成蜂巢Scala的功能和使用它从SQL星火的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何一个可以给我一个例子UDTF(例如,爆炸)用Scala编写返回多行和SparkSQL使用它作为UDF?

表:表1

  + ------ + ---------- + ---------- +
|用户id | someString |瓦拉|
+ ------ + ---------- + ---------- +
| 1 |例1 | [0,2,5] |
| 2 |示例2 | [1,20,5] |
+ ------ + ---------- + ---------- +

我想创建以下斯卡拉code:

 高清exampleUDTF(VAR:序号[INT])=<返回类型???> {
  // code爆炸现场的Vara?
}sqlContext.udf.register(exampleUDTF,exampleUDTF _)sqlContext.sql(FROM表1 SELECT用户ID someString,exampleUDTF(翻))。收()。的foreach(的println)

期望的输出:

  + ------ + ---------- + ---- +
|用户id | someString |翻|
+ ------ + ---------- + ---- +
| 1 |例1 | 0 |
| 1 |例1 | 2 |
| 1 |例1 | 5 |
| 2 |例2 | 1 |
| 2 |例2 | 20 |
| 2 |例2 | 5 |
+ ------ + ---------- + ---- +


解决方案

蜂巢表:

 名ID
[Subhajit森,Binoy蒙达尔,山塔努·杜塔] 15
[Gobinathan SP,苛刻古普塔,拉胡尔·阿南德] 16


  1. 创建Scala的功能:

DEF TOUPPER(名称:序列[字符串])。=(name.map(A => a.toUpperCase))toSeq

<醇开始=2>
  • 注册功能UDF:

  • sqlContext.udf.register(TOUPPER,TOUPPER _)

    <醇开始=3>
  • 电话使用sqlContext和存储输出数据框对象UDF:

  • VAR DF = sqlContext.sql(SELECT TOUPPER(name)根据名称列表)。toDF(姓名)

    <醇开始=4>
  • 爆炸数据框:

  • df.explode(DF(姓名)){案件org.apache.spark.sql.Row(ARR:序号[字符串])=> arr.toSeq.map(V => Tuple1(V))} .drop(DF(姓名))。withColumnRenamed(_ 1,名)。显示

    结果:

      + -------------- +
    |名称|
    + -------------- +
    | SUBHAJIT SEN |
    | BINOY蒙达尔|
    |山塔努DUTTA |
    | GOBINATHAN SP |
    | HARSH古普塔|
    | RAHUL ANAND |
    + -------------- +

    Can any one give me an example UDTF (eg; explode) written in scala which returns multiple row and use it as UDF in SparkSQL?

    Table: table1

    +------+----------+----------+
    |userId|someString|      varA|
    +------+----------+----------+
    |     1|  example1| [0, 2, 5]|
    |     2|  example2|[1, 20, 5]|
    +------+----------+----------+
    

    I'd like to create the following Scala code:

    def exampleUDTF(var: Seq[Int]) = <Return Type???>  {
      // code to explode varA field ???
    }
    
    sqlContext.udf.register("exampleUDTF",exampleUDTF _)
    
    sqlContext.sql("FROM table1 SELECT userId, someString, exampleUDTF(varA)").collect().foreach(println)
    

    Expected output:

    +------+----------+----+
    |userId|someString|varA|
    +------+----------+----+
    |     1|  example1|   0|
    |     1|  example1|   2|
    |     1|  example1|   5|
    |     2|  example2|   1|
    |     2|  example2|  20|
    |     2|  example2|   5|
    +------+----------+----+
    

    解决方案

    Hive Table:

    name    id
    ["Subhajit Sen","Binoy Mondal","Shantanu Dutta"]        15
    ["Gobinathan SP","Harsh Gupta","Rahul Anand"]   16
    

    1. Creating a scala function :

    def toUpper(name: Seq[String]) = (name.map(a => a.toUpperCase)).toSeq

    1. Registering function as UDF :

    sqlContext.udf.register("toUpper",toUpper _)

    1. Calling the UDF using sqlContext and storing output as DataFrame object :

    var df = sqlContext.sql("SELECT toUpper(name) FROM namelist").toDF("Name")

    1. Exploding the DataFrame :

    df.explode(df("Name")){case org.apache.spark.sql.Row(arr: Seq[String]) => arr.toSeq.map(v => Tuple1(v))}.drop(df("Name")).withColumnRenamed("_1","Name").show

    Result:

    +--------------+
    |          Name|
    +--------------+
    |  SUBHAJIT SEN|
    |  BINOY MONDAL|
    |SHANTANU DUTTA|
    | GOBINATHAN SP|
    |   HARSH GUPTA|
    |   RAHUL ANAND|
    +--------------+
    

    这篇关于我想我的所有现有UDTFs转换成蜂巢Scala的功能和使用它从SQL星火的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆