避免与 dplyr::case_when 的类型冲突 [英] Avoiding type conflicts with dplyr::case_when

查看:18
本文介绍了避免与 dplyr::case_when 的类型冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在 dplyr::mutate 中使用 dplyr::case_when 来创建一个新变量,我将一些值设置为缺失值并同时重新编码其他值.

I am trying to use dplyr::case_when within dplyr::mutate to create a new variable where I set some values to missing and recode other values simultaneously.

但是,如果我尝试将值设置为 NA,我会收到错误消息,指出我们无法创建变量 new 因为 NAs合乎逻辑:

However, if I try to set values to NA, I get an error saying that we cannot create the variable new because NAs are logical:

mutate_impl(.data, dots) 中的错误:
评估错误:必须是 double 类型,而不是逻辑类型.

Error in mutate_impl(.data, dots) :
Evaluation error: must be type double, not logical.

有没有办法使用这个方法在数据帧的非逻辑向量中将值设置为 NA?

Is there a way to set values to NA in a non-logical vector in a data frame using this?

library(dplyr)    

# Create data
df <- data.frame(old = 1:3)

# Create new variable
df <- df %>% dplyr::mutate(new = dplyr::case_when(old == 1 ~ 5,
                                                  old == 2 ~ NA,
                                                  TRUE ~ old))

# Desired output
c(5, NA, 3)

推荐答案

?case_when所述:

所有 RHS 必须评估为相同类型的向量.

All RHSs must evaluate to the same type of vector.

你实际上有两种可能性:

You actually have two possibilities:

1) 创建 new 作为数字向量

1) Create new as a numeric vector

df <- df %>% mutate(new = case_when(old == 1 ~ 5,
                                    old == 2 ~ NA_real_,
                                    TRUE ~ as.numeric(old)))

请注意,NA_real_NA 的数字版本,并且您必须将 old 转换为数字,因为您将其创建为整数您的原始数据框.

Note that NA_real_ is the numeric version of NA, and that you must convert old to numeric because you created it as an integer in your original dataframe.

你得到:

str(df)
# 'data.frame': 3 obs. of  2 variables:
# $ old: int  1 2 3
# $ new: num  5 NA 3

2) 创建 new 作为整数向量

2) Create new as an integer vector

df <- df %>% mutate(new = case_when(old == 1 ~ 5L,
                                    old == 2 ~ NA_integer_,
                                    TRUE ~ old))

这里,5L 将 5 强制为整数类型,NA_integer_NA 的整数版本.

Here, 5L forces 5 into the integer type, and NA_integer_ is the integer version of NA.

所以这次new是整数:

str(df)
# 'data.frame': 3 obs. of  2 variables:
# $ old: int  1 2 3
# $ new: int  5 NA 3

这篇关于避免与 dplyr::case_when 的类型冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆