将Spark Scala App的同一数据框中的“天数"列添加到“日期"列 [英] Add Number of days column to Date Column in same dataframe for Spark Scala App

查看:177
本文介绍了将Spark Scala App的同一数据框中的“天数"列添加到“日期"列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个columns ("id", "current_date", "days")dataframe df,我试图将"days"添加到"current_date",并创建一个新的dataframe,其中的新column名为"new_date"使用spark scala函数date_add()

I have a dataframe df of columns ("id", "current_date", "days") and I am trying to add the the "days" to "current_date" and create a new dataframe with new column called "new_date" using spark scala function date_add()

val newDF = df.withColumn("new_Date", date_add(df("current_date"), df("days").cast("Int")))

但是看起来函数date_add仅接受Int值,而不接受columns.在这种情况下如何获得所需的输出?我可以使用其他功能来获得所需的输出吗?

But looks like the function date_add only accepts Int values and not columns. How can get the desired output in such case? Are there any alternative functions i can use to get the desired output?

火花版本:1.6.0 Scala版本:2.10.6

spark version: 1.6.0 scala version: 2.10.6

推荐答案

一个小的自定义udf可用于使此日期运算成为可能.

A small custom udf can be used to make this date arithmetic possible.

import org.apache.spark.sql.functions.udf
import java.util.concurrent.TimeUnit
import java.util.Date
import java.text.SimpleDateFormat    

val date_add = udf((x: String, y: Int) => {
    val sdf = new SimpleDateFormat("yyyy-MM-dd")
    val result = new Date(sdf.parse(x).getTime() + TimeUnit.DAYS.toMillis(y))
  sdf.format(result)
} )

用法:

scala> val df = Seq((1, "2017-01-01", 10), (2, "2017-01-01", 20)).toDF("id", "current_date", "days")
df: org.apache.spark.sql.DataFrame = [id: int, current_date: string, days: int]

scala> df.withColumn("new_Date", date_add($"current_date", $"days")).show()
+---+------------+----+----------+
| id|current_date|days|  new_Date|
+---+------------+----+----------+
|  1|  2017-01-01|  10|2017-01-11|
|  2|  2017-01-01|  20|2017-01-21|
+---+------------+----+----------+

这篇关于将Spark Scala App的同一数据框中的“天数"列添加到“日期"列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆