在数组中选择一系列元素spark sql [英] selecting a range of elements in an array spark sql

查看：641 发布时间：2018/6/12 13:38:58 arrays scala apache-spark hive apache-spark-sql

本文介绍了在数组中选择一系列元素spark sql的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

最近在spark-sql中加载了一个包含数组列的表。

$使用Spark-shell执行以下操作b $ b

以下是同样的ddl：

  create table test_emp_arr {
 dept_id string，
 dept_nm字符串，
 emp_details数组< string> 
}

数据看起来像这样

  + ------- + ------- + ------------------- ------------ + 
 | dept_id | dept_nm | emp_details | 
 + ------- + ------- + ----------------------------- -  + 
 | 10 |财务| [Jon，Snow，Castle，Black，Ned] | 
 | 20 | IT | [奈德，是，不，更多] | 
 + ------- + ------- + ----------------------------- -  +

我可以查询emp_details列，如下所示：

  sqlContext.sql（select emp_details [0] from emp_details）。show

问题

我想查询集合中的一系列元素：

预计查询工作
$ b
sqlContext.sql select emp_details [0-2] from emp_details）。show
或
sqlContext.sql（从emp_details选择emp_details [0：2]）。show

预期产出

+ - ------------------ + | emp_details | + ------------------- + | [Jon，Snow，Castle] | | [Ned，是，否] | + ------------------- +
在纯scala中，如果我有一个数组，例如：

val emp_details = Array（Jon，Snow ，Castle，Black）
我可以使用0到2范围内的元素
emp_details.slice（0,3）
返回给我

Array（Jon，Snow，Castle）
我无法在spark-sql中应用上述数组操作。任何帮助？

谢谢

解决方案
用户定义的功能，它具有适用于任何需要的切片大小的优点。它只是在scala内置 slice 方法的基础上构建一个UDF函数：

import sqlContext.implicits._ import org.apache.spark.sql.functions._ val slice = udf（（array：Seq [String]，from：Int，to：Int ）=> array.slice（from，to））
数据示例示例：

$ b $ pre $ val df = sqlContext.sql（select array（'Jon'，'Snow'，'Castle'，'Black' ，'Ned'）as emp_details） df.withColumn（slice，slice（$emp_details，lit（0），lit（3）））。show

生成预期的输出
+ - ------------------- + ------------------- + | emp_details |片| + -------------------- + ------------------- + | [Jon，Snow，Castl ... | [Jon，Snow，Castle] | + -------------------- + ------------------- +
您也可以在您的 sqlContext 中注册UDF并使用它像这样

sqlContext.udf.register（slice，（array：Seq [String]，from：Int，to：（array）（'Jon'，'Snow'，'Castle'，'Black'，'Ned'），slice（数组）（'Jon'，'Snow'，'Castle'，'Black'，'Ned'），0,3））
此解决方案不再需要点亮

I use Spark-shell to do the below operations

Recently loaded a table with an array column in spark-sql .

Here is the ddl for the same:
create table test_emp_arr{ dept_id string, dept_nm string, emp_details Array<string> }
the data looks something like this
+-------+-------+-------------------------------+ |dept_id|dept_nm| emp_details| +-------+-------+-------------------------------+ | 10|Finance|[Jon, Snow, Castle, Black, Ned]| | 20| IT| [Ned, is, no, more]| +-------+-------+-------------------------------+
i can query the emp_details column something like this :
sqlContext.sql("select emp_details[0] from emp_details").show
Problem

I want to query a range of elements in the collection :

Expected query to work
sqlContext.sql("select emp_details[0-2] from emp_details").show
or
sqlContext.sql("select emp_details[0:2] from emp_details").show
Expected output
+-------------------+ | emp_details| +-------------------+ |[Jon, Snow, Castle]| | [Ned, is, no]| +-------------------+
In pure scala if i have an array something as :
val emp_details = Array("Jon","Snow","Castle","Black")
i can get the elements from 0 to 2 range using
emp_details.slice(0,3)
returns me
Array(Jon, Snow,Castle)
I am not able to apply the above operation of the array in spark-sql . any help ?

Thanks
解决方案
Here is a solution using a User Defined Function which has the advantage of working for any slice size you want. It simply builds a UDF function around the scala builtin slice method :
import sqlContext.implicits._ import org.apache.spark.sql.functions._ val slice = udf((array : Seq[String], from : Int, to : Int) => array.slice(from,to))
Example with a sample of your data :
val df = sqlContext.sql("select array('Jon', 'Snow', 'Castle', 'Black', 'Ned') as emp_details") df.withColumn("slice", slice($"emp_details", lit(0), lit(3))).show
Produces the expected output
+--------------------+-------------------+ | emp_details| slice| +--------------------+-------------------+ |[Jon, Snow, Castl...|[Jon, Snow, Castle]| +--------------------+-------------------+
You can also register the UDF in your sqlContext and use it like this
sqlContext.udf.register("slice", (array : Seq[String], from : Int, to : Int) => array.slice(from,to)) sqlContext.sql("select array('Jon','Snow','Castle','Black','Ned'),slice(array('Jon‌','Snow','Castle','Black','Ned'),0,3)")
You won't need lit anymore with this solution

这篇关于在数组中选择一系列元素spark sql的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在数组中选择一系列元素spark sql [英] selecting a range of elements in an array spark sql

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在数组中选择一系列元素spark sql [英] selecting a range of elements in an array spark sql

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭