WithColumn:显示新列dateTime [英] WithColumn: display a new column dateTime
问题描述
我有一个scala函数,以两个 LocalDateTime
作为参数来计算两个日期之间的差值:
I have a scala function, compute the a difference between tow date, that taking two LocalDateTime
as parameters:
我在DataFrame的2个字段上应用了该功能.
I applied the function on 2 fields of my DataFrame.
似乎添加了新列,因为我的数据框包含 7个字段
,并且在将函数应用于Equals后显示了 8个字段
.但是当我这样做时: dfWithToEquals.printSchema()
它显示此错误:
It seems add the new column because my dataframe contain 7 fields
and it display 8 fields
after applying the function toEquals.
But when I do : dfWithToEquals.printSchema()
It display this error:
有人可以帮助我解决该错误,以显示包含这两个日期之间差异的新列吗?
Someone can help how can I resolve this error to display the new column that contain the difference between these 2 dates ?
推荐答案
input_table.withColumn
返回一个新的DataFrame.因此,要显示它:
input_table.withColumn
returns a new DataFrame. So, to display it:
val dfWithToEquals = input_table.withColumn("toEquals", toEquals($"start_date",$"finish_date"))
dfWithToEquals.printSchema()
dfWithToEquals.show()
更新
要解决无法序列化的任务
异常:传递给Spark的对象必须可序列化.在这里, DATE_TIME_FORMATTER
引用是在 udf
之外创建的,并且不可序列化.尝试将其实例化移动到函数内:
To resolve the Task not serializable
exception: objects passed to Spark must be serializable. Here the DATE_TIME_FORMATTER
reference is created outside the udf
and it is not serialisable. Try to move its instantiation inside the function:
def toEquals = udf((rd1: String, rd2: String) => {
val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")
val d1 = adjust(LocalDateTime.parse(rd1, formatter))
val d2 = adjust(LocalDateTime.parse(rd2, formatter ), asc = false)
// remaining code unchanged
})
更新结束
这篇关于WithColumn:显示新列dateTime的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!