如何从Sparklyr中的字符串中删除'\' [英] How to remove '\' from a string in sparklyr

查看:50
本文介绍了如何从Sparklyr中的字符串中删除'\'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 sparklyr ,并且有一个带有 word 列的spark数据框,其中包含单词,其中有些包含要删除的特殊字符.我成功地在特殊字符之前使用了 regepx_replace \\\\ ,就像这样:

I am using sparklyr and have a spark dataframe with a column wordthat contains words, some of which contain special characters which I want to remove. I was succesful in using regepx_replace and \\\\ before special characters, just like this:

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\\\\(', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\)', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\+', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\?', '')) %>%
  mutate(word = regexp_replace(word, '\\\\:', '')) %>%
  mutate(word = regexp_replace(word, '\\\\;', '')) %>%
  mutate(word = regexp_replace(word, '\\\\!', ''))

现在我要删除 \ .我都尝试过:

Now I want to remove \. I have tried both :

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\\\\\', ''))

和:

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\', ''))

但两者都不行...

推荐答案

您必须更正R端和Java端转义的代码,因此实际上需要的是"\\\\\\\\\":

You have to correct your code for both R-side and Java side escaping so what you need is actually "\\\\\\\\":

df <- copy_to(sc, tibble(word = "(abc\\zyx: 1)"))

df %>% mutate(regexp_replace(word, "\\\\\\\\", ""))

# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word           `regexp_replace(word, "\\\\\\\\\\\\\\\\", "")`
  <chr>          <chr>                                         
1 "(abc\\zyx:1)" (abczyx: 1)  

根据您的确切要求,一次匹配所有字符可能会更容易.例如,您可以只保留单词字符( \ w )和空格( \ s ):

Depending on your exact requirement it might be easier to match all characters at once. You could for example preserve only word characters (\w) and whitespaces (\s):

df %>% mutate(regexp_replace(word, "[^\\\\w+\\\\s+]", ""))

# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word            `regexp_replace(word, "[^\\\\\\\\w+\\\\\\\\s+]", "")`
  <chr>           <chr>                                                
1 "(abc\\zyx: 1)" abczyx 1     

或仅单词字符

df %>% mutate(regexp_replace(word, "[^\\\\w+]", ""))

# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word            `regexp_replace(word, "[^\\\\\\\\w+]", "")`
  <chr>           <chr>                                      
1 "(abc\\zyx: 1)" abczyx1  

这篇关于如何从Sparklyr中的字符串中删除'\'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆