在dplyr中突变虚拟变量 [英] Mutating dummy variables in dplyr
问题描述
我想创建7个虚拟变量-每天使用dplyr
I want to create 7 dummy variables -one for each day, using dplyr
到目前为止,我已经成功地使用做到了sjmisc
包和 to_dummy
函数,但我在2个步骤中做到了-1。创建虚拟变量的df,2)附加到原始df
So far, I have managed to do it using the sjmisc
package and the to_dummy
function, but I do it in 2 steps -1.Create a df of dummies, 2) append to the original df
#Sample dataframe
mydfdata.frame(x=rep(letters[1:9]),
day=c("Mon","Tues","Wed","Thurs","Fri","Sat","Sun","Fri","Mon"))
#1.Create the 7 dummy variables separately
daysdummy<-sjmisc::to_dummy(mydf$day,suffix="label")
#2. append to dataframe
mydf<-bind_cols(mydf,daysdummy)
> mydf
x day day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed
1 a Mon 0 1 0 0 0 0 0
2 b Tues 0 0 0 0 0 1 0
3 c Wed 0 0 0 0 0 0 1
4 d Thurs 0 0 0 0 1 0 0
5 e Fri 1 0 0 0 0 0 0
6 f Sat 0 0 1 0 0 0 0
7 g Sun 0 0 0 1 0 0 0
8 h Fri 1 0 0 0 0 0 0
9 i Mon 0 1 0 0 0 0 0
我的问题是我是否可以使用 dplyr
在单个工作流程中完成并添加 to_dummy
进入管道工作流程-也许使用 mutate
?
My question is whether I can do it in one single workflow using dplyr
and add the to_dummy
into the pipe-workflow- perhaps using mutate
?
* to_dummy
推荐答案
如果要使用管道执行此操作,则可以执行以下操作:
If you want to do this with the pipe, you can do something like:
library(dplyr)
library(sjmisc)
mydf %>%
to_dummy(day, suffix = "label") %>%
bind_cols(mydf) %>%
select(x, day, everything())
返回值:
# A tibble: 9 x 9
x day day_Fri day_Mon day_Sat day_Sun day_Thurs day_Tues day_Wed
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a Mon 0. 1. 0. 0. 0. 0. 0.
2 b Tues 0. 0. 0. 0. 0. 1. 0.
3 c Wed 0. 0. 0. 0. 0. 0. 1.
4 d Thurs 0. 0. 0. 0. 1. 0. 0.
5 e Fri 1. 0. 0. 0. 0. 0. 0.
6 f Sat 0. 0. 1. 0. 0. 0. 0.
7 g Sun 0. 0. 0. 1. 0. 0. 0.
8 h Fri 1. 0. 0. 0. 0. 0. 0.
9 i Mon 0. 1. 0. 0. 0. 0. 0.
使用 dplyr
和 tidyr
我们可以做到:
library(dplyr)
library(tidyr)
mydf %>%
mutate(var = 1) %>%
spread(day, var, fill = 0, sep = "_") %>%
left_join(mydf) %>%
select(x, day, everything())
并使用基数R可以执行以下操作:
And with base R we could do something like:
as.data.frame.matrix(table(rep(mydf$x, lengths(mydf$day)), unlist(mydf$day)))
返回值:
Fri Mon Sat Sun Thurs Tues Wed
a 0 1 0 0 0 0 0
b 0 0 0 0 0 1 0
c 0 0 0 0 0 0 1
d 0 0 0 0 1 0 0
e 1 0 0 0 0 0 0
f 0 0 1 0 0 0 0
g 0 0 0 1 0 0 0
h 1 0 0 0 0 0 0
i 0 1 0 0 0 0 0
这篇关于在dplyr中突变虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!