cut() 一个有缺失值的变量 [英] cut() a variable with missing values

查看:65
本文介绍了cut() 一个有缺失值的变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

cut() 的好方法是什么将量化变量分为多个级别,包括专用于 NA 的最终级别?

What's a good way to cut() a quantiative variable into levels, including a final level dedicated to NAs?

我更喜欢 tidyverse 函数通常提供的 .missing 参数(eg, dplyr::recode() & dplyr::if_else()).

I'd prefer something like the .missing parameter that tidyverse functions commonly offer (e.g., dplyr::recode() & dplyr::if_else()).

如果输入是w,并且这个假设函数名为cut_with_nas,那么下面的代码

If the input is w and this hypothetical function is named cut_with_nas, then the following code

w <- c(0L, NA_integer_, 22:25, NA_integer_, 40)
cut_with_nas(w, breaks=2)

会产生所需的输出:

[1] (-0.04,20] Unknown    (20,40]    (20,40]    (20,40]    (20,40]    Unknown    (20,40]   
Levels: (-0.04,20] (20,40] Unknown

我在下面发布了一个实现此功能的函数,但我希望有一个更简洁的解决方案,或者至少是一个包中已经存在的经过测试的函数.

I'm posting a function below that accomplishes this, but I was hoping there's a more concise solution, or at least a tested function already existing in a package.

推荐答案

cut_with_nas   <- function( x, breaks, labels=NULL, .missing="Unknown" ) {
  y <- cut(x, breaks, labels) #, include.lowest = T, right=F)
  y <- addNA(y)
  levels(y)[is.na(levels(y))] <- .missing
  return( y )
}

此函数的大部分内容都从三年前@akrun 的响应中大量窃取.
(还有一点来自这个悬而未决的问题.)

The majority of this function steals heavily from a response by @akrun three years ago.
(And a little from this unanswered question too.)

这篇关于cut() 一个有缺失值的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆