从所有值为NA,NULL或为空的数据框中删除列 [英] Remove columns from dataframe where ALL values are NA, NULL or empty
问题描述
我有一个数据框,其中某些值为NULL或Empty。我想删除这些列,其中所有值为 NULL 或空 。
列应该从数据框中删除,而不仅仅是隐藏。
I have a dataframe where some of the values are NULL or Empty. I would like to remove these columns in which all values are NULL or empty. Columns should be removed from the dataframe, do not hidden only.
我的头(df)看起来像data =
My head(df) looks like data=
VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7
1 2R+ 52 1.05 0 0 30
2 2R+ 169 1.02 0 0 40
3 2R+ 83 NA 0 0 40
4 2R+ 98 1.16 0 0 40
5 2R+ 154 1.11 0 0 40
6 2R+ 111 NA 0 0 15
数据框包含200多个变量,变量为空,并且零值不会顺序出现。
The dataframe contains more than 200 variables, variables are empty and zero values do not occur sequentially.
我尝试估算平均Col并选择列为Null或为空,类似于删除 NA(请参阅此处),但不起作用。
I tried to estimate the average Col and select the column is Null or empty, by analogy with the removal of "NA" (see here), but it does not work.
df <- df[,colSums(is.na(df))<nrow(df)]
我遇到一个错误: x必须是至少二维的数组
I got an error : 'x' must be an array of at least two dimensions
有人可以给我一些帮助吗?谢谢!
Can anyone give me some help? Thanks!
推荐答案
我们可以使用 Filter
Filter(function(x) !(all(x=="")), df)
# Var1 Var3
#1 2R+ 52
#2 2R+ 169
#3 2R+ 83
#4 2R+ 98
#5 2R+ NA
#6 2R+ 111
#7 2R+ 94
#8 2R+ 116
#9 2R+ 86
注意:应该如果所有元素都是特定列的NA,则也可以使用
NOTE: It should also work if all the elements are NA for a particular column
df$Var3 <- NA
Filter(function(x) !(all(x=="")), df)
# Var1
#1 2R+
#2 2R+
#3 2R+
#4 2R+
#5 2R+
#6 2R+
#7 2R+
#8 2R+
#9 2R+
更新
基于更新的数据集,如果我们需要删除仅包含0个值的列,则将代码更改为
Update
Based on the updated dataset, if we need to remove the columns with only 0 values, then change the code to
Filter(function(x) !(all(x==""|x==0)), df2)
# VAR1 VAR3 VAR4 VAR7
#1 2R+ 52 1.05 30
#2 2R+ 169 1.02 40
#3 2R+ 83 NA 40
#4 2R+ 98 1.16 40
#5 2R+ 154 1.11 40
#6 2R+ 111 NA 15
数据
data
df2 <- structure(list(VAR1 = c("2R+", "2R+", "2R+", "2R+", "2R+", "2R+"
), VAR2 = c("", "", "", "", "", ""), VAR3 = c(52L, 169L, 83L,
98L, 154L, 111L), VAR4 = c(1.05, 1.02, NA, 1.16, 1.11, NA), VAR5 = c(0L,
0L, 0L, 0L, 0L, 0L), VAR6 = c(0L, 0L, 0L, 0L, 0L, 0L), VAR7 = c(30L,
40L, 40L, 40L, 40L, 15L)), .Names = c("VAR1", "VAR2", "VAR3",
"VAR4", "VAR5", "VAR6", "VAR7"), row.names = c("1", "2", "3",
"4", "5", "6"), class = "data.frame")
这篇关于从所有值为NA,NULL或为空的数据框中删除列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!