从数据框中删除包含特定列中的字符串的行 [英] Removing rows from dataframe that contains string in a particular column
问题描述
所以我正在清理R中的一个巨大的数据文件,示例如下所示:
So I'm cleaning up a huge data file in R and an example is as shown:
ID Score
1001 4
1002 2
1003 h
1004 v
1005 3
因为得分列的类是字符,所以我想使用as.numeric函数将4,20和30转换为数值。但是由于数据很脏(包含不合理的字符串,例如h,v),所以我得到一条消息:
Because the class of Score column is "character", I want to use the as.numeric function to convert 4,20 and 30 to numeric values. But since the data is dirty (contains unreasonable strings like h, v), I get the message:
NAs introduced by coercion.
当我运行以下命令时:
as.numeric(df$Score)
所以我想要什么现在要做的是删除数据帧中包含字母字符串的行,这样我将获得:
So what i want to do now is to remove the rows in the dataframe that contains strings with letters so that i would obtain:
ID Score
1001 4
1002 2
1005 3
推荐答案
有多种方法可以执行此操作:
There are multiple ways you can do this :
转换为数字并删除 NA
值
subset(df, !is.na(as.numeric(Score)))
# ID Score
#1 1001 4
#2 1002 20
#5 1005 30
或使用 grepl
查找其中是否包含任何非数字字符并将其删除
Or with grepl
find if there are any non-numeric characters in them and remove them
subset(df, !grepl('\\D', Score))
这也可以通过 grep
完成。
df[grep('\\D', df$Score, invert = TRUE), ]
数据
df <- structure(list(ID = 1001:1005, Score = c("4", "20", "h", "v",
"30")), class = "data.frame", row.names = c(NA, -5L))
这篇关于从数据框中删除包含特定列中的字符串的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!