根据日期过滤火花数据框 [英] Filtering a spark dataframe based on date

查看:32
本文介绍了根据日期过滤火花数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个

date, string, string

我想选择某个时间段之前的日期.我尝试了以下但没有运气

I want to select dates before a certain period. I have tried the following with no luck

 data.filter(data("date") < new java.sql.Date(format.parse("2015-03-14").getTime))

我收到一个错误说明以下内容

I'm getting an error stating the following

org.apache.spark.sql.AnalysisException: resolved attribute(s) date#75 missing from date#72,uid#73,iid#74 in operator !Filter (date#75 < 16508);

据我所知,查询是不正确的.谁能告诉我查询的格式应该如何?

As far as I can guess the query is incorrect. Can anyone show me what way the query should be formatted?

我检查了数据框中的所有条目是否都有值 - 他们确实有.

I checked that all enteries in the dataframe have values - they do.

推荐答案

以下解决方案自 spark 1.5 起适用:

The following solutions are applicable since spark 1.5 :

对于低于:

// filter data where the date is lesser than 2015-03-14
data.filter(data("date").lt(lit("2015-03-14")))      

大于:

// filter data where the date is greater than 2015-03-14
data.filter(data("date").gt(lit("2015-03-14"))) 

对于相等,您可以使用 equalTo=== :

For equality, you can use either equalTo or === :

data.filter(data("date") === lit("2015-03-14"))

如果您的 DataFrame 日期列是 StringType 类型,您可以使用 to_date 函数转换它:

If your DataFrame date column is of type StringType, you can convert it using the to_date function :

// filter data where the date is greater than 2015-03-14
data.filter(to_date(data("date")).gt(lit("2015-03-14"))) 

您还可以使用 year 函数根据年份进行过滤:

You can also filter according to a year using the year function :

// filter data where year is greater or equal to 2016
data.filter(year($"date").geq(lit(2016))) 

这篇关于根据日期过滤火花数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆