r:根据每月的最大值对伪变量进行编码 [英] r: coding dummy variables based-on max value for each month
问题描述
我想基于每个 df $ month
的 df $ var1
中的最大值,编写一个名为 df $ dummy
的新变量,其中最大值将为 1
,每隔一个其他值将为 0
.查看可重复的数据集:
I want to code a new variable called df$dummy
based-on the max value in df$var1
for each df$month
, where the value will be 1
for the max value and 0
for every other value. See reproducible data set:
df<- data.frame(date= seq.Date(from = as.Date('2017-01-01'), by= 7,
length.out = 20), var1= rnorm(20, 5, 3))
df$month<- as.numeric(strftime(df$date, "%m"))
我很难在概念上说明该功能的条件.在Excel中,我只使用 maxif
函数并指定我的标准.我在下面的尝试无效:
I'm having trouble conceptualizing the conditions for the function. In Excel I would just use the maxif
function and specific my criteria. My attempt below does not work:
df$dummy<- apply(df$var1, MARGIN = 2,
function(x) if_else(max(x) %in% df$month, 1, 0))
它返回此错误:
Error in apply(df$var1, MARGIN = 2, function(x) if_else(max(x) %in% df$month, :
dim(X) must have a positive length
如何编码此虚拟变量?是否有使用 mutate_if
的可行的 dplyr
解决方案?
How do I code this dummy variable? Is there a viable dplyr
solution using mutate_if
?
推荐答案
在 dplyr
中,关键是使用 group_by
按月分隔数据帧.然后, var1 == max(var1)
将在每个月内根据需要运行.例如:
In dplyr
, the key is to use group_by
to separate the data frame by month. Then, var1 == max(var1)
will operate within each month, as you want. For example:
library(dplyr)
df<- data.frame(date= seq.Date(from = as.Date('2017-01-01'), by= 7, length.out = 20), var1= rnorm(20, 5, 3))
df$month<- as.numeric(strftime(df$date, "%m"))
df <- df %>%
group_by(month) %>%
mutate(dummy = as.integer(var1 == max(var1))) %>%
ungroup
这篇关于r:根据每月的最大值对伪变量进行编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!