WithColumn:显示新列dateTime [英] WithColumn: display a new column dateTime

查看:47
本文介绍了WithColumn:显示新列dateTime的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个scala函数,以两个 LocalDateTime 作为参数来计算两个日期之间的差值:

I have a scala function, compute the a difference between tow date, that taking two LocalDateTime as parameters:

我在DataFrame的2个字段上应用了该功能.

I applied the function on 2 fields of my DataFrame.

似乎添加了新列,因为我的数据框包含 7个字段,并且在将函数应用于Equals后显示了 8个字段.但是当我这样做时: dfWithToEquals.printSchema()它显示此错误:

It seems add the new column because my dataframe contain 7 fields and it display 8 fields after applying the function toEquals. But when I do : dfWithToEquals.printSchema() It display this error:

有人可以帮助我解决该错误,以显示包含这两个日期之间差异的新列吗?

Someone can help how can I resolve this error to display the new column that contain the difference between these 2 dates ?

推荐答案

input_table.withColumn 返回一个新的DataFrame.因此,要显示它:

input_table.withColumn returns a new DataFrame. So, to display it:

val dfWithToEquals = input_table.withColumn("toEquals", toEquals($"start_date",$"finish_date"))
dfWithToEquals.printSchema()
dfWithToEquals.show()

更新

要解决无法序列化的任务异常:传递给Spark的对象必须可序列化.在这里, DATE_TIME_FORMATTER 引用是在 udf 之外创建的,并且不可序列化.尝试将其实例化移动到函数内:

To resolve the Task not serializable exception: objects passed to Spark must be serializable. Here the DATE_TIME_FORMATTER reference is created outside the udf and it is not serialisable. Try to move its instantiation inside the function:

def toEquals = udf((rd1: String, rd2: String) => {
  val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
  val d1 = adjust(LocalDateTime.parse(rd1, formatter))
  val d2 = adjust(LocalDateTime.parse(rd2, formatter ), asc = false)
  // remaining code unchanged
})

更新结束

这篇关于WithColumn:显示新列dateTime的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆