如何从Sparklyr中的字符串中删除'\' [英] How to remove '\' from a string in sparklyr

查看：50 发布时间：2021/4/8 19:42:43 r apache-spark text sparklyr

本文介绍了如何从Sparklyr中的字符串中删除'\'的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 sparklyr ，并且有一个带有 word 列的spark数据框，其中包含单词，其中有些包含要删除的特殊字符.我成功地在特殊字符之前使用了 regepx_replace 和 \\\\ ，就像这样:

I am using sparklyr and have a spark dataframe with a column wordthat contains words, some of which contain special characters which I want to remove. I was succesful in using regepx_replace and \\\\ before special characters, just like this:

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\\\\(', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\)', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\+', '')) %>% 
  mutate(word = regexp_replace(word, '\\\\?', '')) %>%
  mutate(word = regexp_replace(word, '\\\\:', '')) %>%
  mutate(word = regexp_replace(word, '\\\\;', '')) %>%
  mutate(word = regexp_replace(word, '\\\\!', ''))

现在我要删除 \ .我都尝试过:

Now I want to remove \. I have tried both :

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\\\\\', ''))

和:

words.sdf <- words.sdf %>% 
  mutate(word = regexp_replace(word, '\', ''))

但两者都不行...

推荐答案

您必须更正R端和Java端转义的代码，因此实际上需要的是"\\\\\\\\\":

You have to correct your code for both R-side and Java side escaping so what you need is actually "\\\\\\\\":

df <- copy_to(sc, tibble(word = "(abc\\zyx: 1)"))

df %>% mutate(regexp_replace(word, "\\\\\\\\", ""))

# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word           `regexp_replace(word, "\\\\\\\\\\\\\\\\", "")`
  <chr>          <chr>                                         
1 "(abc\\zyx:1)" (abczyx: 1)

根据您的确切要求，一次匹配所有字符可能会更容易.例如，您可以只保留单词字符( \ w )和空格( \ s ):

Depending on your exact requirement it might be easier to match all characters at once. You could for example preserve only word characters (\w) and whitespaces (\s):

df %>% mutate(regexp_replace(word, "[^\\\\w+\\\\s+]", ""))

# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word            `regexp_replace(word, "[^\\\\\\\\w+\\\\\\\\s+]", "")`
  <chr>           <chr>                                                
1 "(abc\\zyx: 1)" abczyx 1

或仅单词字符

df %>% mutate(regexp_replace(word, "[^\\\\w+]", ""))

# Source:   lazy query [?? x 2]
# Database: spark_shell_connection
  word            `regexp_replace(word, "[^\\\\\\\\w+]", "")`
  <chr>           <chr>                                      
1 "(abc\\zyx: 1)" abczyx1

这篇关于如何从Sparklyr中的字符串中删除'\'的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从Sparklyr中的字符串中删除'\' [英] How to remove '\' from a string in sparklyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何从Sparklyr中的字符串中删除'\' [英] How to remove &#39;\&#39; from a string in sparklyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何从Sparklyr中的字符串中删除'\' [英] How to remove '\' from a string in sparklyr

登录关闭