Sparklyr-在Spark中更改日期格式 [英] Sparklyr - Changing date format in Spark

查看:194
本文介绍了Sparklyr-在Spark中更改日期格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Spark数据框,该列的个字符为2000年1月20日(日/月/年)。

I have a Spark dataframe with a column of characters as 20/01/2000 (day/month/year).

但是我试图将其更改为日期格式,因此我可以在此处使用功能: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions 仅获取我想要的数据(例如,对于提取的月份和日期)。

But I'm trying to change it to date format, so I'd be able to use the funcitons here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions to get only the data I want (as for extract months and days, for example).

但似乎该函数仅在我使用其他日期格式(例如1970-01-30)时才有效。

But seems like the functions only works when I'm using other formats of dates, as 1970-01-30.

示例:

sc <- spark_connect(master = "spark://XXXX")
df <- data.frame(date = c("20/10/2010", "19/11/2010"))
df_tbl <- copy_to(sc, df, "df")

如果我只想在新列中提取月份:

If I want to extract only the month in a new column:

df_tbl <- df_tbl %>% mutate(month = month(date))

我得到:

> df_tbl %>% glimpse()
Observations: 2
Variables: 2
$ data  <chr> "20/10/2010", "19/11/2010"
$ month <int> NA, NA

因为R的函数 as.Date()不起作用,我不得不使用其他工具。

Since R's function as.Date() doesn't work, I'd have to use another tool.

有任何线索吗?

推荐答案

正如已经指出的那样,这失败了,因为 19/11/2010 不是可接受的日期格式。在Spark 2.2或更高版本中,您可以:

As already figured out, this fails because 19/11/2010 is not an accepted date format. In Spark 2.2 or later you can:

df_tbl %>% mutate(month = month(to_date(date, "dd/MM/yyyy")))

# # Source:   lazy query [?? x 2]
# # Database: spark_connection
#   date       month
#    <chr>      <int>
# 1 20/10/2010    10
# 2 19/11/2010    11

在2.1或更早版本中:

In 2.1 or before:

df_tbl %>% 
  mutate(month = month(from_unixtime(unix_timestamp(date, "dd/MM/yyyy"))))

# # Source:   lazy query [?? x 2]
# # Database: spark_connection
#   date       month
#   <chr>      <int>
# 1 20/10/2010    10
# 2 19/11/2010    11

并单独设置格式:

df_tbl %>%  
   mutate(formatted = from_unixtime(
     unix_timestamp(date, "dd/MM/yyyy"), "dd-MM-yyy"))

# # Source:   lazy query [?? x 2]
# # Database: spark_connection
#   date       formatted 
#   <chr>      <chr>     
# 1 20/10/2010 20-10-2010
# 2 19/11/2010 19-11-2010

这篇关于Sparklyr-在Spark中更改日期格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆