变异多个变量以创建多个新变量 [英] Mutate multiple variable to create multiple new variables

查看:18
本文介绍了变异多个变量以创建多个新变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个 tibble,我需要在其中获取多个变量并将它们变异为新的多个新变量.

举个例子,这是一个简单的小标题:

tb <- tribble(~x, ~y1, ~y2, ~y3, ~z,1,2,4,6,2,2,1,2,3,3,3,6,4,2,1)

我想从名称以y"开头的每个变量中减去变量 z,并将结果变异为 tb 的新变量.另外,假设我不知道我有多少y"变量.我希望该解决方案非常适合 tidyverse/dplyr 工作流程.

本质上,我不明白如何将多个变量变异为多个新变量.我不确定您是否可以在这种情况下使用 mutate ?我已经尝试过 mutate_if,但我认为我没有正确使用它(并且出现错误):

tb %>% mutate_if(starts_with("y"), funs(.-z))#Error: 没有注册 tidyselect 变量

提前致谢!

解决方案

因为操作的是列名,所以需要使用 mutate_at 而不是 mutate_if列内的值

tb %>% mutate_at(vars(starts_with(y")), funs(. - z))#># 小块:3 x 5#>x y1 y2 y3 z#><dbl><dbl><dbl><dbl><dbl>#>1 1 0 2 4 2#>2 2 -2 -1 0 3#>3 3 5 3 1 1

要创建新列,而不是覆盖现有列,我们可以为 funs

命名

#添加后缀tb %>% mutate_at(vars(starts_with(y")), funs(mod = . - z))#># 小块:3 x 8#>x y1 y2 y3 z y1_mod y2_mod y3_mod#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 0 2 4#>2 2 1 2 3 3 -2 -1 0#>3 3 6 4 2 1 5 3 1# 去除后缀,添加前缀tb%>%mutate_at(vars(starts_with(y")), funs(mod = . - z)) %>%rename_at(vars(ends_with("_mod")), funs(paste("mod", gsub("_mod", "", .), sep = "_")))#># 小块:3 x 8#>x y1 y2 y3 z mod_y1 mod_y2 mod_y3#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 0 2 4#>2 2 1 2 3 3 -2 -1 0#>3 3 6 4 2 1 5 3 1


编辑:在 dplyr 0.8.0 或更高版本中,funs() 将被弃用(source1 & source2),需要改用list()

tb %>% mutate_at(vars(starts_with(y")), list(~ . - z))#># 小块:3 x 5#>x y1 y2 y3 z#><dbl><dbl><dbl><dbl><dbl>#>1 1 0 2 4 2#>2 2 -2 -1 0 3#>3 3 5 3 1 1tb %>% mutate_at(vars(starts_with(y")), list(mod = ~ . - z))#># 小块:3 x 8#>x y1 y2 y3 z y1_mod y2_mod y3_mod#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 0 2 4#>2 2 1 2 3 3 -2 -1 0#>3 3 6 4 2 1 5 3 1tb%>%mutate_at(vars(starts_with("y")), list(mod = ~ . - z)) %>%rename_at(vars(ends_with("_mod")), list(~ paste("mod", gsub("_mod", "", .), sep = "_")))#># 小块:3 x 8#>x y1 y2 y3 z mod_y1 mod_y2 mod_y3#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 0 2 4#>2 2 1 2 3 3 -2 -1 0#>3 3 6 4 2 1 5 3 1


编辑 2:dplyr 1.0.0+across() 函数进一步简化了这个任务

<块引用>

基本用法

<块引用>

across() 有两个主要参数:

<块引用>

  • 第一个参数 .cols 选择要操作的列.它使用整洁的选择(如 select()),因此您可以通过以下方式选择变量位置、名称和类型.

<块引用>

  • 第二个参数 .fns 是一个函数或要应用的函数列表每列.这也可以是 purrr 风格的公式(或公式列表)像~.x/2.(这个参数是可选的,如果你只是想要,你可以省略它获取底层数据;你会看到该技术用于vignette(rowwise").)

# 控制如何使用 `.names` 参数创建名称# 采用 [glue](http://glue.tidyverse.org/) 规范:tb%>%变异(跨越(starts_with(y"),〜.x - z,.names =mod_{col}"))#># 小块:3 x 8#>x y1 y2 y3 z mod_y1 mod_y2 mod_y3#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 0 2 4#>2 2 1 2 3 3 -2 -1 0#>3 3 6 4 2 1 5 3 1tb%>%变异(跨越(num_range(前缀 = y",范围 = 1:3),~ .x - z,.names = mod_{col}"))#># 小块:3 x 8#>x y1 y2 y3 z mod_y1 mod_y2 mod_y3#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 0 2 4#>2 2 1 2 3 3 -2 -1 0#>3 3 6 4 2 1 5 3 1### 多种功能tb%>%变异(跨越(c(匹配(x"),包含(z")),〜max(.x,na.rm = TRUE),.names =max_{col}"),跨越(c(y1:y3),〜.x - z,.names =mod_{col}"))#># 小费:3 x 10#>x y1 y2 y3 z max_x max_z mod_y1 mod_y2 mod_y3#><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>#>1 1 2 4 6 2 3 3 0 2 4#>2 2 1 2 3 3 3 3 -2 -1 0#>3 3 6 4 2 1 3 3 5 3 1

reprex 包 (v0.2.1) 于 2018 年 10 月 29 日创建

Let's say I have a tibble where I need to take multiple variables and mutate them into new multiple new variables.

As an example, here is a simple tibble:

tb <- tribble(
  ~x, ~y1, ~y2, ~y3, ~z,
  1,2,4,6,2,
  2,1,2,3,3,
  3,6,4,2,1
)

I want to subtract variable z from every variable with a name starting with "y", and mutate the results as new variables of tb. Also, suppose I don't know how many "y" variables I have. I want the solution to fit nicely within tidyverse / dplyr workflow.

In essence, I don't understand how to mutate multiple variables into multiple new variables. I'm not sure if you can use mutate in this instance? I've tried mutate_if, but I don't think I'm using it right (and I get an error):

tb %>% mutate_if(starts_with("y"), funs(.-z))

#Error: No tidyselect variables were registered

Thanks in advance!

解决方案

Because you are operating on column names, you need to use mutate_at rather than mutate_if which uses the values within columns

tb %>% mutate_at(vars(starts_with("y")), funs(. - z))
#> # A tibble: 3 x 5
#>       x    y1    y2    y3     z
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     0     2     4     2
#> 2     2    -2    -1     0     3
#> 3     3     5     3     1     1

To create new columns, instead of overwriting existing ones, we can give name to funs

# add suffix
tb %>% mutate_at(vars(starts_with("y")), funs(mod = . - z))
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z y1_mod y2_mod y3_mod
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

# remove suffix, add prefix
tb %>%
  mutate_at(vars(starts_with("y")),  funs(mod = . - z)) %>%
  rename_at(vars(ends_with("_mod")), funs(paste("mod", gsub("_mod", "", .), sep = "_")))
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1


Edit: In dplyr 0.8.0 or higher versions, funs() will be deprecated (source1 & source2), need to use list() instead

tb %>% mutate_at(vars(starts_with("y")), list(~ . - z))
#> # A tibble: 3 x 5
#>       x    y1    y2    y3     z
#>   <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     0     2     4     2
#> 2     2    -2    -1     0     3
#> 3     3     5     3     1     1

tb %>% mutate_at(vars(starts_with("y")), list(mod = ~ . - z))
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z y1_mod y2_mod y3_mod
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

tb %>%
  mutate_at(vars(starts_with("y")),  list(mod = ~ . - z)) %>%
  rename_at(vars(ends_with("_mod")), list(~ paste("mod", gsub("_mod", "", .), sep = "_")))
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1


Edit 2: dplyr 1.0.0+ has across() function which simplifies this task even further

Basic usage

across() has two primary arguments:

  • The first argument, .cols, selects the columns you want to operate on. It uses tidy selection (like select()) so you can pick variables by position, name, and type.

  • The second argument, .fns, is a function or list of functions to apply to each column. This can also be a purrr style formula (or list of formulas) like ~ .x / 2. (This argument is optional, and you can omit it if you just want to get the underlying data; you'll see that technique used in vignette("rowwise").)

# Control how the names are created with the `.names` argument which 
# takes a [glue](http://glue.tidyverse.org/) spec:
tb %>% 
  mutate(
    across(starts_with("y"), ~ .x - z, .names = "mod_{col}")
  )
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

tb %>% 
  mutate(
    across(num_range(prefix = "y", range = 1:3), ~ .x - z, .names = "mod_{col}")
  )
#> # A tibble: 3 x 8
#>       x    y1    y2    y3     z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2      0      2      4
#> 2     2     1     2     3     3     -2     -1      0
#> 3     3     6     4     2     1      5      3      1

### Multiple functions
tb %>% 
  mutate(
    across(c(matches("x"), contains("z")), ~ max(.x, na.rm = TRUE), .names = "max_{col}"),
    across(c(y1:y3), ~ .x - z, .names = "mod_{col}")
  )
#> # A tibble: 3 x 10
#>       x    y1    y2    y3     z max_x max_z mod_y1 mod_y2 mod_y3
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>
#> 1     1     2     4     6     2     3     3      0      2      4
#> 2     2     1     2     3     3     3     3     -2     -1      0
#> 3     3     6     4     2     1     3     3      5      3      1

Created on 2018-10-29 by the reprex package (v0.2.1)

这篇关于变异多个变量以创建多个新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆