在Spark Dataframe API中将出生日期转换为年龄 [英] Convert Date of Birth into Age in Spark Dataframe API

查看:460
本文介绍了在Spark Dataframe API中将出生日期转换为年龄的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这似乎很简单,但我找不到答案.我正在尝试将以下日期格式的出生日期列转换为Spark Dataframe API中的日期格式,然后计算相应的年龄.我可能还需要系统日期.我发现了一些可能有用的Java库,但在将其与数据框api结合使用时仍然遇到一些困难.

This seems simple but I couldn't find the answer. I'm trying to convert a column of date-of-birth in the below date format to the date format in Spark Dataframe API and then calculate the corresponding ages. I probably need the system dates as well. I have found some java libraries that may be useful but I am still having some difficulties in using it with dataframe api.

23-AUG-67
28-FEB-66
09-APR-59

23-AUG-67
28-FEB-66
09-APR-59

2015年9月10日修改:我刚刚发现Spark 1.5.0添加了日期时间功能",这将在以后发布1.5.0时提供帮助.

9/10/2015 Edit: I just found Spark 1.5.0 adds "Date Time Functions" which will be helpful in the future when 1.5.0 is released here. Unfortunately, It doesn't work with the current spark version in AWS EMR.

2015年9月10日晚上修改: 我可以使用以下代码将出生日期转换为年龄.
请注意,不推荐使用getYear()函数,但是据我所知它们可以正常工作.

9/10/2015 Evening Edit: I was able to convert the date of birth into age using the below code.
Note the getYear() function is deprecated but as I can tell they work fine.

import java.sql.Date
import java.text.SimpleDateFormat
import org.apache.spark.sql.SQLContext

val sqlsc= new SQLContext(sc)

val epoch = System.currentTimeMillis
val curDate = new Date(epoch)
val dtFormat = new SimpleDateFormat("dd-MMM-yy")

val dobToAge = udf( (dob: String) => {
  val javaUtilDate = dtFormat.parse(dob)
  val sqlDate = new Date(javaUtilDate.getTime())
  curDate.getYear - sqlDate.getYear
})

inputdata.withColumn("AGE", dobToAge('dob))

推荐答案

您应该使用java.util.Calendar,而不是使用java.util.Date中不推荐使用的getXXX方法.

Instead of using the deprecated getXXX methods of java.util.Date, you should rather use java.util.Calendar.

您的解决方案也并非在所有情况下都有效.如果某人出生于1976年12月31日.即使2015年1月1日他几乎整整39岁,他的年龄也将被计算为2015-1976 = 39.

Also your solution doesn't work in all cases. If someone is born on December 31st 1976. His age will be computed as 2015-1976 = 39 even though on January, 1st 2015 he won't be 39 for almost a full year.

您应该使用如下所示的计算方法:

You should rather use a computation as shown in: http://howtodoinjava.com/2014/05/26/java-code-to-calculate-age-from-date-of-birth/ (converting the Java code to Scala shouldn't be much of a problem).

这篇关于在Spark Dataframe API中将出生日期转换为年龄的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆