从Apache Spark SQL split()函数获取最后一个元素 [英] Get the last element from Apache Spark SQL split() Function

查看:558
本文介绍了从Apache Spark SQL split()函数获取最后一个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从Array中获取从Spark SQL split()函数返回的最后一个元素.

I want to get the last element from the Array that return from Spark SQL split() function.

split(4:3-2:3-5:4-6:4-5:2,'-')

我知道它可以解决

split(4:3-2:3-5:4-6:4-5:2,'-')[4]

但是当我不知道Array的长度时,我想要另一种方法. 请帮助我.

But i want another way when i don't know the length of the Array . please help me.

推荐答案

您可以使用UDF执行此操作,如下所示:

You could use an UDF to do that, as follow:

val df = sc.parallelize(Seq((1L,"one-last1"), (2L,"two-last2"), (3L,"three-last3"))).toDF("key","Value")
+---+-----------+
|key|Value      |
+---+-----------+
|1  |one-last1  |
|2  |two-last2  |
|3  |three-last3|
+---+-----------+

val get_last = udf((xs: Seq[String]) => Try(xs.last).toOption)

val with_just_last = df.withColumn("Last" , get_last(split(col("Value"), "-")))
+---+-----------+--------+
|key|Value      |Last    |
+---+-----------+--------+
|1  |one-last1  |last1   |
|2  |two-last2  |last2   |
|3  |three-last3|last3   |
+---+-----------+--------+

请记住,SparkSQL的 split 函数可以应用于DataFrame的列.

Remember that the split function from SparkSQL can be applied to a column of the DataFrame.

这篇关于从Apache Spark SQL split()函数获取最后一个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆