使用hive命令更改DF中的字符串并使用sparklyr进行变异 [英] change string in DF using hive command and mutate with sparklyr

查看:168
本文介绍了使用hive命令更改DF中的字符串并使用sparklyr进行变异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Hive命令 regexp_extract 我试图更改以下字符串:

  201703170455 to 2017-03-17:04:55 

p>

  2017031704555675至2017-03-17:04:55.0010 

我在sparklyr中尝试使用此代码与R中的gsub配合使用:

 <$ (...)(..)(..),\\ 1  -  \\ 2-\\ 3:\\ 4:\\ 5))

以及此代码:

pre $ newdf< -df%> mutate(TimeTrans = regexp_extract(( ...)(..)(..)(..)(..)(....),\\ 1-\\ 2-\\3:\ \ 4:\\5.\\6))

但不一切工作。关于如何使用regexp_extract来做到这一点的任何建议?

解决方案

Apache Spark使用Java正则表达式方言而不是R,并且应该引用组与 $ 。此外 regexp_replace 用于提取单个组您可以使用 regexp_replace

  df < -  data.frame(time = c(201703170455,2017031704555675))
sdf < - copy_to(sc,df)

sdf%>%
mutate(time1 = regexp_replace(
time,^(....)(..)(..)(..)( ...)$,$ 1- $ 2- $ 3 $ 4:$ 5))%>%
mutate(time2 = regexp_replace(
time,^(....)(..) (..)(..)(..)(....)$,$ 1- $ 2- $ 3 $ 4:$ 5. $ 6))


$ b

 来源:query [2 x 3] 
数据库:spark连接master = local [8] app = sparklyr local = TRUE

#一个tibble:2 x 3
time time1 time2
< CHR> < CHR> < CHR>
1 201703170455 2017-03-17 04:55 201703170455
2 2017031704555675 2017031704555675 2017-03-17 04:55.5675


Using the Hive command regexp_extract I am trying to change the following strings from:

201703170455 to 2017-03-17:04:55

and from:

2017031704555675 to 2017-03-17:04:55.0010

I am doing this in sparklyr trying to use this code that works with gsub in R:

  newdf<-df%>%mutate(Time1 = regexp_extract(Time, "(....)(..)(..)(..)(..)", "\\1-\\2-\\3:\\4:\\5"))

and this code:

newdf<-df%>mutate(TimeTrans = regexp_extract("(....)(..)(..)(..)(..)(....)", "\\1-\\2-\\3:\\4:\\5.\\6"))

but does not work at all. Any suggestions of how to do this using regexp_extract?

解决方案

Apache Spark uses Java regular expression dialect not R, and groups should be referenced with $. Furthermore regexp_replace is used to extract a single group by a numeric index.

You can use regexp_replace:

df <- data.frame(time = c("201703170455", "2017031704555675"))
sdf <- copy_to(sc, df)

sdf %>% 
  mutate(time1 = regexp_replace(
    time, "^(....)(..)(..)(..)(..)$", "$1-$2-$3 $4:$5" )) %>%
  mutate(time2 = regexp_replace(
    time, "^(....)(..)(..)(..)(..)(....)$", "$1-$2-$3 $4:$5.$6"))

Source:   query [2 x 3]
Database: spark connection master=local[8] app=sparklyr local=TRUE

# A tibble: 2 x 3
              time            time1                 time2
             <chr>            <chr>                 <chr>
1     201703170455 2017-03-17 04:55          201703170455
2 2017031704555675 2017031704555675 2017-03-17 04:55.5675

这篇关于使用hive命令更改DF中的字符串并使用sparklyr进行变异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆