删除数据框列表中的NA列 [英] Remove NA columns in a list of dataframes

查看:138
本文介绍了删除数据框列表中的NA列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法清除使用readxl从Excel导入的数据. readxl用class = c('data.frame', tbl_df, tbl)创建了大量对象(我也想知道为什么/如何为其分配了多个类).这些对象中的每一个都是原始Excel工作簿中的工作表之一.问题在于,这些对象(工作表)中的每一个都可能有许多列完全用NA填充.我浏览了stackoverflow,发现了一些类似的问题,并尝试应用给定的解决方案,例如 (第一个是最像我的问题).但是,当我尝试这样做时:

I am having some trouble cleaning data that I imported from Excel with readxl. readxl created a large list of objects with classes = c('data.frame', tbl_df, tbl) (I would also like to know about why/how it has multiple classes assigned to it). Each of those objects is one of the sheets in the original Excel workbook. The problem is that each of those objects (sheets) may have many columns entirely filled with NAs. I have scanned through stackoverflow and found some similar problems and tried to apply the given solutions like here and here (the first one is the most like my problem). However when I try this:

lapply(x, function(y) y[, !is.na(y)])

我收到以下错误:

Error in `[.data.frame`(y, , !is.na(y)) : undefined columns selected

我也尝试过:

lapply(x, function(y) y[!is.na(y)]

但是将我所有的数据框减少到仅第一列.我想我知道这与我的dataframe-in-list语法有关.我已经尝试过y[[]][]的不同迭代,甚至最近在lapply中发现了这种有趣的模式:lapply(x, "[[", y),但无法使其正常工作.

but it reduces all of my dataframes to only the first column. I think I know it's something to do with my dataframe-within-list syntax. I've experimented with different iterations of y[[]][] and even recently found this interesting pattern in lapply: lapply(x, "[[", y), but couldn't make it work.

这是我的数据帧列表中的前两个对象(对如何在dput中更有效地处理此数据的任何提示也都表示赞赏).如您所见,第一个对象没有NA列,而第二个对象有5个NA列.我想删除那5个NA列,但要删除列表中的所有对象.

Here are the first two objects in my list of dataframes (any hints on how to be more efficient in dput-ing this data are also appreciated). As you can see, the first object has no NA columns, whereas the second has 5 NA columns. I would like to remove those 5 NA columns, but do so for all objects in my list.

任何帮助将不胜感激!

dput(head(x[[1]]))
structure(list(Date = structure(c(1305504000, 1305504000, 1305504000, 
1305504000, 1305504000, 1305504000), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Time = structure(c(-2209121912, -2209121612, 
-2209121312, -2209121012, -2209120712, -2209120412), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Level = c(106.9038, 106.9059, 106.89, 
106.9121, 106.8522, 106.8813), Temperature = c(6.176, 6.173, 
6.172, 6.168, 6.166, 6.165)), .Names = c("Date", "Time", "Level", 
"Temperature"), row.names = c(NA, 6L), class = c("tbl_df", "tbl", 
"data.frame"))

dput(head(x[[2]]))
structure(list(Date = structure(c(1305504000, 1305504000, 1305504000, 
1305504000, 1305504000, 1305504000), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Time = structure(c(-2209121988, -2209121688, 
-2209121388, -2209121088, -2209120788, -2209120488), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), LEVEL = c(117.5149, 117.511, 117.5031, 
117.5272, 117.4523, 117.4524), TEMPERATURE = c(5.661, 5.651, 
5.645, 5.644, 5.644, 5.645), `NA` = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), `NA` = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), `NA` = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), `NA` = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), `NA` = c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_)), .Names = c("Date", "Time", "LEVEL", 
"TEMPERATURE", NA, NA, NA, NA, NA), row.names = c(NA, 6L), class =    
c("tbl_df", "tbl", "data.frame"))

推荐答案

这是怎么回事:

lapply(df_list, function(df) df[, colSums(is.na(df)) == 0])

或者:

lapply(df_list, function(df) df[, colSums(is.na(df)) < nrow(df)])

如果您想允许部分但不是全部行为NA

if you want to allow some, but not all rows to be NA

这篇关于删除数据框列表中的NA列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆