将多个值定义为数据框中缺失的值 [英] Define multiple values as missing in a data frame

查看:53
本文介绍了将多个值定义为数据框中缺失的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 R 的数据框中将多个值定义为缺失值?

How do I define multiple values as missing in a data frame in R?

考虑一个数据框,其中两个值888"和999"代表缺失数据:

Consider a data frame where two values, "888" and "999", represent missing data:

df <- data.frame(age=c(50,30,27,888),insomnia=c("yes","no","no",999))
df[df==888] <- NA
df[df==999] <- NA

此解决方案为每个表示缺失数据的值使用一行代码.对于表示缺失数据的值数量较多的情况,您是否有更简单的解决方案?

This solution takes one line of code per value representing missing data. Do you have a more simple solution for situations where the number of values representing missing data is high?

推荐答案

这里提供三个解决方案:

Here are three solutions:

# 1. Data set
df <- data.frame(
  age = c(50, 30, 27, 888),
  insomnia = c("yes", "no", "no", 999))

# 2. Solution based on "one line of code per missing data value"
df[df == 888] <- NA
df[df == 999] <- NA
is.na(df)

# 3. Solution based on "applying function to each column of data set"
df[sapply(df, function(x) as.character(x) %in% c("888", "999") )] <- NA
is.na(df)

# 4. Solution based on "dplyr"

# 4.1. Load package
library(dplyr)

# 4.2. Define function for missing values
is_na <- function(x){
 return(as.character(x) %in% c("888", "999")) 
}

# 4.3. Apply function to each column
df %>% lapply(is_na)

这篇关于将多个值定义为数据框中缺失的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆