r:根据每月的最大值对伪变量进行编码 [英] r: coding dummy variables based-on max value for each month

查看:39
本文介绍了r:根据每月的最大值对伪变量进行编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想基于每个 df $ month df $ var1 中的最大值,编写一个名为 df $ dummy 的新变量,其中最大值将为 1 ,每隔一个其他值将为 0 .查看可重复的数据集:

I want to code a new variable called df$dummy based-on the max value in df$var1 for each df$month, where the value will be 1 for the max value and 0 for every other value. See reproducible data set:

df<- data.frame(date= seq.Date(from = as.Date('2017-01-01'), by= 7, 
                length.out = 20), var1= rnorm(20, 5, 3))

df$month<- as.numeric(strftime(df$date, "%m"))

我很难在概念上说明该功能的条件.在Excel中,我只使用 maxif 函数并指定我的标准.我在下面的尝试无效:

I'm having trouble conceptualizing the conditions for the function. In Excel I would just use the maxif function and specific my criteria. My attempt below does not work:

df$dummy<- apply(df$var1, MARGIN = 2, 
                 function(x) if_else(max(x) %in% df$month, 1, 0))

它返回此错误:

Error in apply(df$var1, MARGIN = 2, function(x) if_else(max(x) %in% df$month,  : 
dim(X) must have a positive length

如何编码此虚拟变量?是否有使用 mutate_if 的可行的 dplyr 解决方案?

How do I code this dummy variable? Is there a viable dplyr solution using mutate_if?

推荐答案

dplyr 中,关键是使用 group_by 按月分隔数据帧.然后, var1 == max(var1)将在每个月内根据需要运行.例如:

In dplyr, the key is to use group_by to separate the data frame by month. Then, var1 == max(var1) will operate within each month, as you want. For example:

library(dplyr)
df<- data.frame(date= seq.Date(from = as.Date('2017-01-01'), by= 7, length.out = 20), var1= rnorm(20, 5, 3))
df$month<- as.numeric(strftime(df$date, "%m"))

df <- df %>%
  group_by(month) %>%
  mutate(dummy = as.integer(var1 == max(var1))) %>%
  ungroup

这篇关于r:根据每月的最大值对伪变量进行编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆