在 R 中有效地创建数字编码的虚拟变量? [英] Create numerically encoded dummy variables efficiently in R?

查看:33
本文介绍了在 R 中有效地创建数字编码的虚拟变量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们如何转换表单的数据

df <- 结构(列表(客户编号 = c(3, 3, 1, 1, 3),item = c("奶昔","汉堡","苹果","汉堡","水")),row.names = c(NA, -5L), class = "data.frame")# customer_number 项目# 1 3 奶昔# 2 3 汉堡# 3 1 苹果# 4 1 汉堡# 5 3 水

变成数字编码的虚拟变量,像这样

<预><代码>data.frame(customer_number=c(1,3),item_milkshake=c(0,1),item_burger=c(1,1),item_apple=c(1,0),item_water=c(0,1))# customer_number item_milkshake item_burger item_apple item_water# 1 1 0 1 1 0# 2 3 1 1 0 1

解决方案

我们可以创建一个值为 1 的虚拟列,并以宽格式获取数据.

库(dplyr)df%>%变异(n = 1)%>%安排(customer_number) %>%tidyr::pivot_wider(names_from = item, values_from = n,values_fill = list(n = 0), names_prefix = "item_")# 小块:2 x 5# customer_number item_apple item_burger item_milkshake item_water# <dbl><dbl><dbl><dbl><dbl>#1 1 1 1 0 0#2 3 0 1 1 1

How can we transform data of the form

df <- structure(list(customer_number = c(3, 3, 1, 1, 3), 
                     item = c("milkshake","burger", "apple", "burger", "water")
                       ), 
                row.names = c(NA, -5L), class = "data.frame")


#   customer_number      item
# 1               3 milkshake
# 2               3    burger
# 3               1     apple
# 4               1    burger
# 5               3     water

into numerically encoded dummy variables, like this


data.frame(customer_number=c(1,3),
           item_milkshake=c(0,1),
           item_burger=c(1,1),
           item_apple=c(1,0),
           item_water=c(0,1))

#   customer_number item_milkshake item_burger item_apple item_water
# 1               1              0           1          1          0
# 2               3              1           1          0          1

解决方案

We can create a dummy column with value as 1 and get the data in wide format.

library(dplyr)

df %>%
  mutate(n = 1) %>%
  arrange(customer_number) %>%
  tidyr::pivot_wider(names_from = item, values_from = n,
                     values_fill = list(n = 0), names_prefix = "item_")

# A tibble: 2 x 5
#  customer_number item_apple item_burger item_milkshake item_water
#            <dbl>      <dbl>       <dbl>          <dbl>      <dbl>
#1               1          1           1              0          0
#2               3          0           1              1          1

这篇关于在 R 中有效地创建数字编码的虚拟变量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆