如何从Sparklyr中的字符串中删除'\' [英] How to remove '\' from a string in sparklyr
问题描述
我正在使用 sparklyr
,并且有一个带有 word
列的spark数据框,其中包含单词,其中有些包含要删除的特殊字符.我成功地在特殊字符之前使用了 regepx_replace
和 \\\\
,就像这样:
I am using sparklyr
and have a spark dataframe with a column word
that contains words, some of which contain special characters which I want to remove. I was succesful in using regepx_replace
and \\\\
before special characters, just like this:
words.sdf <- words.sdf %>%
mutate(word = regexp_replace(word, '\\\\(', '')) %>%
mutate(word = regexp_replace(word, '\\\\)', '')) %>%
mutate(word = regexp_replace(word, '\\\\+', '')) %>%
mutate(word = regexp_replace(word, '\\\\?', '')) %>%
mutate(word = regexp_replace(word, '\\\\:', '')) %>%
mutate(word = regexp_replace(word, '\\\\;', '')) %>%
mutate(word = regexp_replace(word, '\\\\!', ''))
现在我要删除 \
.我都尝试过:
Now I want to remove \
. I have tried both :
words.sdf <- words.sdf %>%
mutate(word = regexp_replace(word, '\\\\\', ''))
和:
words.sdf <- words.sdf %>%
mutate(word = regexp_replace(word, '\', ''))
但两者都不行...
推荐答案
您必须更正R端和Java端转义的代码,因此实际上需要的是"\\\\\\\\\"
:
You have to correct your code for both R-side and Java side escaping so what you need is actually "\\\\\\\\"
:
df <- copy_to(sc, tibble(word = "(abc\\zyx: 1)"))
df %>% mutate(regexp_replace(word, "\\\\\\\\", ""))
# Source: lazy query [?? x 2]
# Database: spark_shell_connection
word `regexp_replace(word, "\\\\\\\\\\\\\\\\", "")`
<chr> <chr>
1 "(abc\\zyx:1)" (abczyx: 1)
根据您的确切要求,一次匹配所有字符可能会更容易.例如,您可以只保留单词字符( \ w
)和空格( \ s
):
Depending on your exact requirement it might be easier to match all characters at once. You could for example preserve only word characters (\w
) and whitespaces (\s
):
df %>% mutate(regexp_replace(word, "[^\\\\w+\\\\s+]", ""))
# Source: lazy query [?? x 2]
# Database: spark_shell_connection
word `regexp_replace(word, "[^\\\\\\\\w+\\\\\\\\s+]", "")`
<chr> <chr>
1 "(abc\\zyx: 1)" abczyx 1
或仅单词字符
df %>% mutate(regexp_replace(word, "[^\\\\w+]", ""))
# Source: lazy query [?? x 2]
# Database: spark_shell_connection
word `regexp_replace(word, "[^\\\\\\\\w+]", "")`
<chr> <chr>
1 "(abc\\zyx: 1)" abczyx1
这篇关于如何从Sparklyr中的字符串中删除'\'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!