从数据框中删除特殊字符 [英] Remove special characters from data frame

查看：125 发布时间：2020/7/31 3:28:34 r regex grep gsub non-printing-characters

本文介绍了从数据框中删除特殊字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含字符串每米能量"的矩阵. "m"之前是带有问号的菱形符号-我不知道它是什么.

I have a matrix that contains the string "Energy per �m". Before the 'm' is a diamond shaped symbol with a question mark in it - I don't know what it is.

我试图通过在矩阵列上使用它来摆脱它:

I have tried to get rid of it by using this on the column of the matrix:

a=gsub('Energy per �m','',a)

[并且对gsub的第一项使用复制/粘贴]，但是它不起作用.["a = rep(5，Energy per"中的意外符号).当我尝试从原始矩阵中提取某些内容时， grepl我得到了:

[and using copy/paste for the first term of gsub], but it does not work.[unexpected symbol in "a=rep(5,Energy per"]. When I try to extract something from the original matrix with grepl I get:

46: In grepl("ref. value", raw$parameter) :
input string 15318 is invalid in this locale

如何摆脱所有此类迹象?我只想使用0-9，A-Z，a-z，/和'.其余的可以压缩.

How can I get rid of all this sort of signs? I would like to have only 0-9, A-Z, a-z, / and '. The rest can be zapped.

推荐答案

比使用正则表达式(例如，通过更改Encoding)，可能是一种更好的方法.

There is probably a better way to do this than with regex (e.g. by changing the Encoding).

但这是您的正则表达式解决方案:

But here is your regex solution:

gsub("[^0-9A-Za-z///' ]", "", a)
[1] "Energy per m"

但是，正如@JoshuaUlrich所指出的，您最好使用:

But, as pointed out by @JoshuaUlrich, you're better off to use:

gsub("[^[:alnum:]///' ]", "", x)
[1] "Energy per m"

这篇关于从数据框中删除特殊字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从数据框中删除特殊字符 [英] Remove special characters from data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从数据框中删除特殊字符 [英] Remove special characters from data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭