在 Spark Dataframe API 中将出生日期转换为年龄 [英] Convert Date of Birth into Age in Spark Dataframe API

查看:91
本文介绍了在 Spark Dataframe API 中将出生日期转换为年龄的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这看起来很简单,但我找不到答案.我正在尝试将以下日期格式的出生日期列转换为 Spark Dataframe API 中的日期格式,然后计算相应的年龄.我可能也需要系统日期.我发现了一些可能有用的 java 库,但我在将它与 dataframe api 一起使用时仍然遇到一些困难.

This seems simple but I couldn't find the answer. I'm trying to convert a column of date-of-birth in the below date format to the date format in Spark Dataframe API and then calculate the corresponding ages. I probably need the system dates as well. I have found some java libraries that may be useful but I am still having some difficulties in using it with dataframe api.

23-AUG-67
28-FEB-66
09-APR-59

23-AUG-67
28-FEB-66
09-APR-59

9/10/2015 编辑:我刚刚发现 Spark 1.5.0 添加了日期时间函数",这将在未来 1.5.0 发布时有所帮助 这里.不幸的是,它不适用于 AWS EMR 中的当前 Spark 版本.

9/10/2015 Edit: I just found Spark 1.5.0 adds "Date Time Functions" which will be helpful in the future when 1.5.0 is released here. Unfortunately, It doesn't work with the current spark version in AWS EMR.

9/10/2015 晚间编辑:我能够使用以下代码将出生日期转换为年龄.
请注意,getYear() 函数已被弃用,但据我所知,它们运行良好.

9/10/2015 Evening Edit: I was able to convert the date of birth into age using the below code.
Note the getYear() function is deprecated but as I can tell they work fine.

import java.sql.Date
import java.text.SimpleDateFormat
import org.apache.spark.sql.SQLContext

val sqlsc= new SQLContext(sc)

val epoch = System.currentTimeMillis
val curDate = new Date(epoch)
val dtFormat = new SimpleDateFormat("dd-MMM-yy")

val dobToAge = udf( (dob: String) => {
  val javaUtilDate = dtFormat.parse(dob)
  val sqlDate = new Date(javaUtilDate.getTime())
  curDate.getYear - sqlDate.getYear
})

inputdata.withColumn("AGE", dobToAge('dob))

推荐答案

与其使用已弃用的 java.util.Date 的 getXXX 方法,不如使用 java.util.Calendar.

Instead of using the deprecated getXXX methods of java.util.Date, you should rather use java.util.Calendar.

此外,您的解决方案并非在所有情况下都有效.如果某人出生于 1976 年 12 月 31 日,那么他的年龄将被计算为 2015-1976 = 39,即使在 2015 年 1 月 1 日他几乎整整一年都不会是 39.

Also your solution doesn't work in all cases. If someone is born on December 31st 1976. His age will be computed as 2015-1976 = 39 even though on January, 1st 2015 he won't be 39 for almost a full year.

您应该使用如下所示的计算:http://howtodoinjava.com/2014/05/26/java-code-to-calculate-age-from-date-of-birth/(将 Java 代码转换为 Scala应该问题不大).

You should rather use a computation as shown in: http://howtodoinjava.com/2014/05/26/java-code-to-calculate-age-from-date-of-birth/ (converting the Java code to Scala shouldn't be much of a problem).

这篇关于在 Spark Dataframe API 中将出生日期转换为年龄的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆