如何在不删除列或行的情况下从数据集中清除或删除NA值 [英] How to clean or remove NA values from a dataset without remove the column or row

查看:536
本文介绍了如何在不删除列或行的情况下从数据集中清除或删除NA值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在不删除NA所在的行或列的情况下,是否有任何优雅的解决方案可以从NA值中清除数据帧?

Is any elegant solution to clean a dataframe from NA values without remove the row or column where the NA is?

示例:

输入数据框

    C1    C2     C3
 R1  A   <NA>  <NA>
 R2 <NA>  A    <NA>
 R3 <NA> <NA>   A
 R4  B   <NA>  <NA>
 R5 <NA>  B    <NA>
 R6 <NA> <NA>  <NA>
 R7  C   <NA>   B
 R8       C    <NA>
 R9            <NA>
 R10           <NA>
 R11            C

输出数据框

    C1  C2  C3
R1  A   A   A
R2  B   B   B
R3  C   C   C

例如,这是一个充满NA值的混乱数据帧(df1)

For example, here is a messy dataframe (df1) full of NA values

    A       B       C       D       E       F    G    H    I    J    K
1 Healthy    <NA>    <NA>    <NA>    <NA>    <NA> <NA> <NA> <NA> <NA> <NA>
2    <NA> Healthy    <NA>    <NA>    <NA>    <NA> <NA> <NA> <NA> <NA> <NA>
3    <NA>    <NA> Healthy    <NA>    <NA>    <NA> <NA> <NA> <NA> <NA> <NA>
4    <NA>    <NA>    <NA> Healthy    <NA>    <NA> <NA> <NA> <NA> <NA> <NA>
5    <NA>    <NA>    <NA>    <NA> Healthy    <NA> <NA> <NA> <NA> <NA> <NA>
6    <NA>    <NA>    <NA>    <NA>    <NA> Healthy <NA> <NA> <NA> <NA> <NA>

这里是数据框的样子.

Here is how it should be the dataframe.

   X1        X2        X3      X4        X5        X6        X7      X8      X9       X10       X11
1 Healthy   Healthy   Healthy Healthy   Healthy   Healthy   Healthy Healthy Healthy   Healthy   Healthy
2 Healthy   Healthy   Healthy Healthy   Healthy   Healthy   Healthy Healthy Healthy   Healthy   Healthy
3 Healthy ICDAS_1_2 ICDAS_1_2 Healthy ICDAS_1_2 ICDAS_1_2 ICDAS_1_2 Healthy Healthy ICDAS_1_2 ICDAS_1_2
4 Healthy   Healthy   Healthy Healthy   Healthy   Healthy   Healthy Healthy Healthy   Healthy   Healthy
5 Healthy   Healthy   Healthy Healthy   Healthy   Healthy   Healthy Healthy Healthy   Healthy   Healthy
6 Healthy   Healthy   Healthy Healthy   Healthy   Healthy   Healthy Healthy Healthy   Healthy   Healthy

请注意,原始数据帧中的单元格B-2现在位于X2-1中.因此,这里的主要问题是从Calc或Excel中找到等同于删除单元格并将所有单元格向上移动"功能的功能

我找到的所有答案都删除了< NA>值所在的所有行或列. 我设法做到的方式是(很抱歉,如果这是原始方法)是仅将有效值提取到新的数据框中:

All the answers that I found delete all the row or column where the <NA> value was. The way I managed to do it is (and sorry if this is primitive) was to extract only the valid values to a new dataframe:

首先.我创建一个空的数据框

First. I create an empty dataframe

library("data.table") # required package
new_dataframe <-  data.frame(matrix("", ncol = 11, nrow = 1400) )

然后,我将每个有效值从旧数据帧复制到新数据帧

Then, I copy every valid value from the old to the new dataframe

new_dataframe$X1 <- df1$A[!is.na(df2$A)]
new_dataframe$X2 <- df1$B[!is.na(df2$B)]
new_dataframe$X3 <- df1$C[!is.na(df2$C)]

所以,我的问题是:有没有更优雅的解决方案来从NA值清除"数据框?

任何帮助将不胜感激.

Any help is greatly appreciated.

推荐答案

如果这对您手动起作用:

If this works for you manually:

new_dataframe$X1 <- df1$A[!is.na(df2$A)]
new_dataframe$X2 <- df1$B[!is.na(df2$B)]
new_dataframe$X3 <- df1$C[!is.na(df2$C)]

然后这应该会自动运行:

then this should work automatically:

new_dataframe = as.data.frame(lapply(df1, na.omit))

也应该起作用(在任意数量的列上). (皮埃尔在注释中建议的是对代码的更直接翻译:as.data.frame(lapply(mydf, function(x) x[!is.na(x)])).)

should also work (on an arbitrary number of columns). (A more direct translation of your code is what Pierre suggested in the comments: as.data.frame(lapply(mydf, function(x) x[!is.na(x)])).)

请注意,数据框必须为矩形(每列必须具有相同的行数),因此这可以按您希望的方式工作,并且仅如果每列具有相同数量的非缺失值.如果某些行的非缺失值较少,则将对它们进行回收以填充数据帧的长度:

Beware that data frames must be rectangular (each column must have the same number of rows), so this will work as you might hope and expect only if each column has the same number of non-missing values. If some rows have fewer non-missing values, they will be recycled to fill out the length of the data frame:

x = data.frame(a = c(1, NA, 2), b = c(2, NA, 3), c = c(NA, "A", NA))
x
#    a  b    c
# 1  1  2 <NA>
# 2 NA NA    A
# 3  2  3 <NA>

as.data.frame(lapply(x, na.omit))
#   a b c
# 1 1 2 A
# 2 2 3 A

更好的方法可能是先转换为列表:

A better approach might be to just convert to a list first:

y = lapply(x, na.omit)

然后,在确定是否要强制执行数据框之前,您可以先查看sapply(y, length)的内容.

You can then see what you've got sapply(y, length) before deciding if you want to coerce to data frame or not.

这篇关于如何在不删除列或行的情况下从数据集中清除或删除NA值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆