在R中使用Spark将Double转换为Date [英] Convert Double to Date using Spark in R

查看:160
本文介绍了在R中使用Spark将Double转换为Date的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的R数据帧

I have an R data frame as below

Date @AD.CC_CC @AD.CC_CC.1 @CL.CC_CC @CL.CC_CC.1
2018-02-05      -380        -380     -1580       -1580
2018-02-06        20          20      -280        -280
2018-02-07      -700        -700     -1730       -1730
2018-02-08      -460        -460     -1100       -1100
2018-02-09       260         260     -1780       -1780
2018-02-12       480         480       380         380

我使用copy_to函数将数据帧复制到Spark.转换后,它将所有行转换为双精度.

I use the copy_to function to copy the dataframe to Spark. After conversion it converts all the rows to double.

# Source:   lazy query [?? x 5]
# Database: spark_connection
Date AD_CC_CC AD_CC_CC_1 CL_CC_CC CL_CC_CC_1
<dbl>    <dbl>      <dbl>    <dbl>      <dbl>
17567     -380       -380    -1580      -1580
17568       20         20     -280       -280
17569     -700       -700    -1730      -1730
17570     -460       -460    -1100      -1100
17571      260        260    -1780      -1780
17574      480        480      380        380

我正在尝试使用以下命令将其转换回Date,但会引发错误.

I am trying to convert it back to Date using the below command but throws an error.

marketdata_spark %>% mutate(Date = as.Date(Date))
Error: org.apache.spark.sql.AnalysisException: cannot resolve 'CAST(marketdata.`Date` AS DATE)' due to data type mismatch: cannot cast double to date; line 1 pos 59;

不确定该怎么做.

推荐答案

这看起来像是sparklyr错误.最简单的解决方法是在调用copy_to之前将日期转换为字符:

This looks like a sparklyr bug. The simplest workaround is to cast dates to character, before calling copy_to:

df <- tibble::tibble(Date=as.Date(c("2018-02-05", "2018-02-06")))
sdf <- df %>% mutate(Date = as.character(Date)) %>% copy_to(sc, .)

sdf

# Source:   table<sparklyr_11ae23aa677e> [?? x 1]
# Database: spark_connection
  Date      
  <chr>     
1 2018-02-05
2 2018-02-06

并稍后投放:

sdf %>% mutate(Date = to_date(Date))

# Source:   lazy query [?? x 1]
# Database: spark_connection
  Date      
  <date>    
1 2018-02-05
2 2018-02-06

从Unix时代开始,您还可以尝试使用数字值作为偏移量:

You can also try using the numeric value as an offset since beginning of the Unix epoch:

sdf <- df  %>% copy_to(sc, .)

sdf

# Source:   table<sparklyr_13ab19ec6f53> [?? x 1]
# Database: spark_connection
   Date
  <dbl>
1 17567
2 17568

sdf %>% mutate(Date = date_add(to_date("1970-01-01"), Date))

# Source:   lazy query [?? x 1]
# Database: spark_connection
  Date      
 <date>    
1 2018-02-05
2 2018-02-06

或者,您可以完全跳过copy_to(反正它的应用程序非常有限,并且在生产中很少有用),并且可以使用一种内置输入格式(spark_read_*).

Alternatively, you can skip copy_to completely (it has very limited applications anyway, and is seldom useful in production) and use one of built-in input formats (spark_read_*).

这篇关于在R中使用Spark将Double转换为Date的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆