火花日期格式问题 [英] Spark date format issue

查看:148
本文介绍了火花日期格式问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在火花日期格式中观察到了奇怪的行为.实际上,我需要将日期yy转换为yyyy.日期转换后,日期应该是20yy

I have observed weird behavior in spark date formatting. Actually I need to convert the date yy to yyyy. After date conversion it should be 20yy in date

我尝试如下,但2040年后失败了.

I have tried as below, it failing after 2040 year.

import org.apache.spark.sql.functions._
val df=   Seq(("06/03/35"),("07/24/40"), ("11/15/43"), ("12/15/12"), ("11/15/20"), ("12/12/22")).toDF("Date")

df.withColumn("newdate", from_unixtime(unix_timestamp($"Date", "mm/dd/yy"), "mm/dd/yyyy")).show

+--------+----------+
|    Date|   newdate|
+--------+----------+
| 06/3/35|06/03/2035|
|07/24/40|07/24/2040|
|11/15/43|11/15/1943|  // Here year appended with 19
|12/15/12|12/15/2012|
|11/15/20|11/15/2020|
|12/12/22|12/12/2022|
+--------+----------+

为什么会这样,是否有任何我可以直接使用的日期实用函数,而无需在字符串日期后附加20

Why this behavior, Is there any date utility function that I can use directly without appending 20 to string date

推荐答案

解析两位数的年份字符串受

Parsing 2-digit year strings is subject to some relative interpretation that is documented in the SimpleDateFormat docs:

要使用缩写年份模式("y"或"yy")进行解析,SimpleDateFormat必须解释相对于某个世纪的缩写年份.它通过将日期调整为在创建SimpleDateFormat实例之前的80年内和之后的20年内来实现此目的.例如,使用模式"MM/dd/yy"和1997年1月1日创建的SimpleDateFormat实例,字符串"01/11/12"将解释为2012年1月11日,而字符串"05/04/" 64"将被解释为1964年5月4日.

For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created. For example, using a pattern of "MM/dd/yy" and a SimpleDateFormat instance created on Jan 1, 1997, the string "01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64" would be interpreted as May 4, 1964.

因此,2043距离分析仪已有20多年的历史了,解析器使用的文档是1943.

So, 2043 being more than 20 years away, the parser uses 1943 as documented.

这是一种使用UDF的方法,该UDF在解析日期之前显式调用SimpleDateFormat对象上的set2DigitYearStart(我以1980年为例):

Here's one approach that uses a UDF that explicitly calls set2DigitYearStart on a SimpleDateFormat object before parsing the date (I picked 1980 just as an example):

def parseDate(date: String, pattern: String): Date = {

    val format = new SimpleDateFormat(pattern);
    val cal = Calendar.getInstance();
    cal.set(Calendar.YEAR, 1980)
    val beginning = cal.getTime();

    format.set2DigitYearStart(beginning)

    return new Date(format.parse(date).getTime);
}

然后:

val custom_to_date = udf(parseDate _);
df.withColumn("newdate", custom_to_date($"Date", lit("mm/dd/yy"))).show(false)

+--------+----------+
|Date    |newdate   |
+--------+----------+
|06/03/35|2035-01-03|
|07/24/40|2040-01-24|
|11/15/43|2043-01-15|
|12/15/12|2012-01-15|
|11/15/20|2020-01-15|
|12/12/22|2022-01-12|
+--------+----------+

了解数据后,您将知道选择哪个值作为set2DigitYearStart()

Knowing your data, you would know which value to pick for the parameter to set2DigitYearStart()

这篇关于火花日期格式问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆