如何从日期字符串中提取年份? [英] How to extract year from a date string?

查看:1821
本文介绍了如何从日期字符串中提取年份?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用spark 2.1.2.

I'm using spark 2.1.2.

我正在处理日期时间数据,并希望使用spark sql函数从dt字符串中获取年份.

I'm working with datetime data, and would like to get the year from a dt string using spark sql functions.

我使用的代码如下:

import org.apache.spark.sql.functions._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val spark: SparkSession = SparkSession.builder().
appName("myapp").master("local").getOrCreate()

case class Person(id: Int, date: String)
import spark.implicits._

val mydf: DataFrame = Seq(Person(1,"9/16/13")).toDF()
val select_df: DataFrame = mydf.select(unix_timestamp(mydf("date"),"MM/dd/yy").cast(TimestampType))
select_df.select(year($"date")).show()

在上面的示例中,我希望日期的年份为13.

I expect the year of the date as 13 in the example above.

实际: org.apache.spark.sql.AnalysisException:在给定的输入列下,无法解析"date":[CAST(unix_timestamp(date,MM/dd/yy)AS TIMESTAMP)] ;; '项目[year('date)AS year(date)#11]

Actual: org.apache.spark.sql.AnalysisException: cannot resolve 'date' given input columns: [CAST(unix_timestamp(date, MM/dd/yy) AS TIMESTAMP)];; 'Project [year('date) AS year(date)#11]

推荐答案

case class Person(id: Int, date: String)
val mydf = Seq(Person(1,"9/16/13")).toDF
val solution = mydf.withColumn("year", year(to_timestamp($"date", "MM/dd/yy")))
scala> solution.show
+---+-------+----+
| id|   date|year|
+---+-------+----+
|  1|9/16/13|2013|
+---+-------+----+

看起来year不会给您两位数字,而是四年.我将字符串截断留给您作为家庭练习:)

It looks like year does not give you two digits but four for years. I'm leaving the string truncation as a home exercise for you :)

实际:org.apache.spark.sql.AnalysisException:无法解析给定输入列的日期":[CAST(unix_timestamp(date,MM/dd/yy)AS TIMESTAMP)] ;; '项目[year('date)AS year(date)#11]

Actual: org.apache.spark.sql.AnalysisException: cannot resolve 'date' given input columns: [CAST(unix_timestamp(date, MM/dd/yy) AS TIMESTAMP)];; 'Project [year('date) AS year(date)#11]

出现异常的原因是您要访问select(select(unix_timestamp(mydf("date"),"MM/dd/yy").cast(TimestampType)))之后不再可用的旧" date列(在select(year($"date"))中).

The reason of the exception is that you want to access the "old" date column (in select(year($"date"))) that's no longer available after select (select(unix_timestamp(mydf("date"),"MM/dd/yy").cast(TimestampType)).

您可以使用aliasas将看起来很奇怪的自动生成的名称再次更改为类似date的名称,这将起作用.

You could use alias or as to change the weird-looking auto-generated name into something else like date again, and that would work.

这篇关于如何从日期字符串中提取年份?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆