如何用其他列给定范围内的所有值创建列 [英] How to create a column with all the values in a range given by another column

查看:76
本文介绍了如何用其他列给定范围内的所有值创建列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在以下使用Spark的场景中有问题,我有一个DataFrame,其中的一列包含一个具有开始和结束值的数组,例如

I have a problem with the following scenario using Spark, I have a DataFrame with a column contains an array with start and end value, e.g.

[1000, 1010]

想知道如何创建&计算另一列包含一个数组,该数组保存给定范围的所有值?生成的范围值列的结果将是:

would like to know how to create & compute another column contains a array that holding all the values for the given range? the result of the generated range values column will be:

    +--------------+-------------+-----------------------------+
    |   Description|     Accounts|                        Range|
    +--------------+-------------+-----------------------------+
    |       Range 1|   [101, 105]|    [101, 102, 103, 104, 105]|
    |       Range 2|   [200, 203]|         [200, 201, 202, 203]|
    +--------------+-------------+-----------------------------+

预先感谢

推荐答案

您将为此创建一个UDF.

You'll have to create a UDF for this.

df.show
+-----------+----------+
|Description|  Accounts|
+-----------+----------+
|    Range 1|[100, 105]|
|    Range 2|[200, 203]|
+-----------+----------+

我试图在这里介绍一些可能的极端情况.如果发现缺少任何内容,可以添加更多.

I have tried to cover few of the possible edge cases here. You can add more if you see anything missing.

val createRange = udf{ (xs: Seq[Int]) => 
    if(xs.length == 0 ) Array[Int]()
    else if (xs.length == 1) (0 to xs(0) ).toArray
    else (xs(0) to xs(1) ).toArray 
}

在数据框上调用此UDF createRange 并传递数组 Accounts

Call this UDF createRange on your Dataframe and pass the Array Accounts

df.withColumn("Range" , createRange($"Accounts") ).show(false)
+-----------+----------+------------------------------+
|Description|Accounts  |Range                         |
+-----------+----------+------------------------------+
|Range 1    |[100, 105]|[100, 101, 102, 103, 104, 105]|
|Range 2    |[200, 203]|[200, 201, 202, 203]          |
+-----------+----------+------------------------------+

这篇关于如何用其他列给定范围内的所有值创建列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆