时间列应使用哪种数据类型 [英] What data type should be used for a time column

查看:53
本文介绍了时间列应使用哪种数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的Spark应用程序中,我必须拆分时间和数据,并将它们存储在单独的列中,如下所示:

In my Spark appliction, I had to split the time and data and store them in separate column as follow:

val df5=df4.withColumn("read_date",date_format(df4.col("date"), "yyyy-MM-dd")).withColumn("read_time",date_format(df4.col("date"), "HH:mm:ss")).drop("date")

此命令将拆分数据和时间

This command will split data and time

------------+-------------
2012-01-12     00:06:00
------------+-------------

但是将两个字段都创建为String.因此,我必须 .cast("date")作为日期,但是用于时间列的数据类型是什么?如果我使用 .cast("timestamp")之类的东西,它将把当前服务器的日期和时间结合起来.当我们要在Power BI中可视化数据时,您认为将时间存储为String是正确的做法吗?

but creates both fields as String. So, I have to .cast("date") for date, but what data type to use for time column? If I use like .cast("timestamp") it will combine the current server date to the time. As we are going to visualize the data in Power BI, do you think storing the time as String is right approach to do?

推荐答案

Spark中没有DataType来保存'HH:mm:ss'值.相反,您可以使用hour(),minute()和second()函数分别表示值.

There is no DataType in Spark to hold 'HH:mm:ss' values. Instead you can use hour(), minute() and second() functions to represent the values respectively.

所有这些函数都返回 int 类型.

All these functions return int types.

hour(string date) -- Returns the hour of the timestamp: hour('2009-07-30 12:58:59') = 12, hour('12:58:59') = 12.

minute(string date) -- Returns the minute of the timestamp.

second(string date) -- Returns the second of the timestamp.

这篇关于时间列应使用哪种数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆