从R中的字符串中删除所有特殊字符? [英] Remove all special characters from a string in R?
问题描述
如何删除R中给定字符串中的所有特殊字符并用空格替换每个特殊字符?
How to remove all special characters in a given string in R and replace each special character with space ?
要删除的特殊字符为:〜!@#$%^& *(){} _ +:<>?,。/;'[] - =
The special characters to remove are : ~!@#$%^&*(){}_+:"<>?,./;'[]-=
regex [:punct:]
将要完成一半的工作。
regex [:punct:]
going to make half of the job.
问题2:要删除那些疯狂的角色:âíüáá
?
Question2 : How to delete remowe those crazy characters : â í ü Â á
?
答案2:尝试替换[^ [:alnum :]]与[^ a-zA-Z0-9]与 regex
或 regexpr
。
Answer2 : Try replacing [^[:alnum:]] with [^a-zA-Z0-9] with regex
or regexpr
.
推荐答案
您需要使用正则表达式来标识不需要的字符对于最容易读取的代码,您需要 str_replace_all
从 stringr
包,但 gsub
。
You need to use regular expressions to identify the unwanted characters. For the most easily readable code, you want the str_replace_all
from the stringr
package, though gsub
from base R works just as well.
精确的正则表达式取决于你想要做什么。您可以删除您在问题中提供的那些特定字符,但更容易删除所有标点符号。
The exact regular expression depends upon what you are trying to do. You could just remove those specific characters that you gave in the question, but it's much easier to remove all punctuation characters.
x <- "a1~!@#$%^&*(){}_+:\"<>?,./;'[]-=" #or whatever
str_replace_all(x, "[[:punct:]]", " ")
c $ c> gsub([[:punct:]],,x)。)
(The base R equivalent is gsub("[[:punct:]]", " ", x)
.)
所有非字母数字字符。
str_replace_all(x, "[^[:alnum:]]", " ")
请注意,构成字母或数字或标点符号的定义会根据您的区域设置,因此您可能需要尝试一下才能准确得到您想要的内容。
Note that the definition of what constitutes a letter or a number or a punctuatution mark varies slightly depending upon your locale, so you may need to experiment a little to get exactly what you want.
这篇关于从R中的字符串中删除所有特殊字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!