使用hive命令更改DF中的字符串并使用sparklyr进行变异 [英] change string in DF using hive command and mutate with sparklyr

查看：168 发布时间：2018/6/12 13:38:40 r apache-spark hive gsub sparklyr

本文介绍了使用hive命令更改DF中的字符串并使用sparklyr进行变异的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用Hive命令 regexp_extract 我试图更改以下字符串：

  201703170455 to 2017-03-17：04：55

  2017031704555675至2017-03-17：04：55.0010

我在sparklyr中尝试使用此代码与R中的gsub配合使用：

 <$ （...）（..）（..），\\ 1  -  \\ 2-\\ 3：\\ 4：\\ 5））

以及此代码：

pre $ newdf< -df％> mutate（TimeTrans = regexp_extract（（ ...）（..）（..）（..）（..）（....），\\ 1-\\ 2-\\3：\ \ 4：\\5.\\6））

但不一切工作。关于如何使用regexp_extract来做到这一点的任何建议？

解决方案

Apache Spark使用Java正则表达式方言而不是R，并且应该引用组与 $ 。此外 regexp_replace 用于提取单个组您可以使用 regexp_replace ：

  df < -  data.frame（time = c（201703170455，2017031704555675））
 sdf < -  copy_to（sc，df） 
 
 sdf％>％
 mutate（time1 = regexp_replace（
 time，^（....）（..）（..）（..）（ ...）$，$ 1- $ 2- $ 3 $ 4：$ 5））％>％
 mutate（time2 = regexp_replace（
 time，^（....）（..） （..）（..）（..）（....）$，$ 1- $ 2- $ 3 $ 4：$ 5. $ 6））

$ b
来源：query [2 x 3] 数据库：spark连接master = local [8] app = sparklyr local = TRUE ＃一个tibble：2 x 3 time time1 time2 < CHR> < CHR> < CHR> 1 201703170455 2017-03-17 04:55 201703170455 2 2017031704555675 2017031704555675 2017-03-17 04：55.5675

Using the Hive command regexp_extract I am trying to change the following strings from:
201703170455 to 2017-03-17:04:55
and from:
2017031704555675 to 2017-03-17:04:55.0010
I am doing this in sparklyr trying to use this code that works with gsub in R:
newdf<-df%>%mutate(Time1 = regexp_extract(Time, "(....)(..)(..)(..)(..)", "\\1-\\2-\\3:\\4:\\5"))
and this code:
newdf<-df%>mutate(TimeTrans = regexp_extract("(....)(..)(..)(..)(..)(....)", "\\1-\\2-\\3:\\4:\\5.\\6"))
but does not work at all. Any suggestions of how to do this using regexp_extract?
解决方案
Apache Spark uses Java regular expression dialect not R, and groups should be referenced with $. Furthermore regexp_replace is used to extract a single group by a numeric index.

You can use regexp_replace:
df <- data.frame(time = c("201703170455", "2017031704555675")) sdf <- copy_to(sc, df) sdf %>% mutate(time1 = regexp_replace( time, "^(....)(..)(..)(..)(..)$", "$1-$2-$3 $4:$5" )) %>% mutate(time2 = regexp_replace( time, "^(....)(..)(..)(..)(..)(....)$", "$1-$2-$3 $4:$5.$6"))

Source: query [2 x 3] Database: spark connection master=local[8] app=sparklyr local=TRUE # A tibble: 2 x 3 time time1 time2 <chr> <chr> <chr> 1 201703170455 2017-03-17 04:55 201703170455 2 2017031704555675 2017031704555675 2017-03-17 04:55.5675

这篇关于使用hive命令更改DF中的字符串并使用sparklyr进行变异的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用hive命令更改DF中的字符串并使用sparklyr进行变异 [英] change string in DF using hive command and mutate with sparklyr

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用hive命令更改DF中的字符串并使用sparklyr进行变异 [英] change string in DF using hive command and mutate with sparklyr

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭