使用 mutate_at 创建新变量,同时保留原始变量 [英] Create new variables with mutate_at while keeping the original ones

查看:31
本文介绍了使用 mutate_at 创建新变量,同时保留原始变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑这个简单的例子:

库(dplyr)数据帧 <- 数据帧(helloo = c(1,2,3,4,5,6),ooooHH = c(1,1,1,2,2,2),ahaaa = c(200,400,120,300,100,100))# 小费:6 x 3你好 ooooHH ahaaa<dbl><dbl><dbl>1 1 1 2002 2 1 4003 3 1 1204 4 2 3005 5 2 1006 6 2 100

这里我想将函数 ntile 应用到包含 oo 的所有列,但我希望这些新列被称为 cat + 对应的列.

我知道我可以做到这一点

dataframe %>% mutate_at(vars(contains('oo')), .funs = funs(ntile(., 2)))# 小费:6 x 3你好 ooooHH ahaaa<int><int><dbl>1 1 1 2002 1 1 4003 1 1 1204 2 2 3005 2 2 1006 2 2 100

但我需要的是这个

# tibble: 8 x 5你好 ooooHH ahaaa cat_helloo cat_ooooHH<dbl><dbl><dbl><int><int>1 1 1 200 1 12 2 1 400 1 13 3 1 120 1 14 4 2 300 2 25 5 2 100 2 26 5 2 100 2 27 6 2 100 2 28 6 2 100 2 2

有没有不需要存储中间数据并合并回原始数据帧的解决方案?

解决方案

更新 2020-06 for dplyr 1.0.0

dplyr 1.0.0 开始,across() 函数取代了诸如 mutate_at() 等函数的范围变体".代码在 across() 中看起来应该很熟悉,它嵌套在 mutate() 中.

为您在列表中给出的函数添加名称会将函数名称添加为后缀.

数据帧%>%变异(跨(包含('oo'),.fns = list(cat = ~ntile(., 2))) )# 小费:6 x 5你好 ooooHH 啊哈 hello_cat ooooHH_cat<dbl><dbl><dbl><int><int>1 1 1 200 1 12 2 1 400 1 13 3 1 120 1 14 4 2 300 2 25 5 2 100 2 26 6 2 100 2 2

使用 across() 中的 .names 参数在 1.0.0 中更改新列名称更容易一些.这是将函数名称添加为前缀而不是后缀的示例.这使用胶水语法.

数据帧%>%变异(跨(包含('oo'),.fns = list(cat = ~ntile(., 2)),.names = "{fn}_{col}" ) )# 小费:6 x 5你好 ooooHH ahaaa cat_helloo cat_ooooHH<dbl><dbl><dbl><int><int>1 1 1 200 1 12 2 1 400 1 13 3 1 120 1 14 4 2 300 2 25 5 2 100 2 26 6 2 100 2 2

mutate_at() 的原始答案

编辑以反映 dplyr 中的更改.从 dplyr 0.8.0 开始,funs() 已弃用,应使用 list()~ 代替.

您可以为传递给 .funs 的列表中的函数命名,以创建带有后缀名称的新变量.

dataframe %>% mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2)))# 小费:6 x 5你好 ooooHH 啊哈 hello_cat ooooHH_cat<dbl><dbl><dbl><int><int>1 1 1 200 1 12 2 1 400 1 13 3 1 120 1 14 4 2 300 2 25 5 2 100 2 26 6 2 100 2 2

如果您希望将其作为前缀,则可以使用 rename_at 来更改名称.

数据帧%>%mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2))) %>%rename_at( vars( contains( "_cat") ), list( ~paste("cat", gsub("_cat", "", .), sep = "_") ) )# 小费:6 x 5你好 ooooHH ahaaa cat_helloo cat_ooooHH<dbl><dbl><dbl><int><int>1 1 1 200 1 12 2 1 400 1 13 3 1 120 1 14 4 2 300 2 25 5 2 100 2 26 6 2 100 2 2

带有 funs() 的先前代码来自 dplyr 的早期版本:

数据帧%>%mutate_at(vars(contains('oo')), .funs = funs(cat = ntile(., 2))) %>%rename_at( vars( contains( "_cat") ), funs( paste("cat", gsub("_cat", "", .), sep = "_") ) )

Consider this simple example:

library(dplyr)

dataframe <- data_frame(helloo = c(1,2,3,4,5,6),
                        ooooHH = c(1,1,1,2,2,2),
                        ahaaa = c(200,400,120,300,100,100))

# A tibble: 6 x 3
  helloo ooooHH ahaaa
   <dbl>  <dbl> <dbl>
1      1      1   200
2      2      1   400
3      3      1   120
4      4      2   300
5      5      2   100
6      6      2   100

Here I want to apply the function ntile to all the columns that contains oo, but I would like these new columns to be called cat + the corresponding column.

I know I can do this

dataframe %>% mutate_at(vars(contains('oo')), .funs = funs(ntile(., 2)))
# A tibble: 6 x 3
  helloo ooooHH ahaaa
   <int>  <int> <dbl>
1      1      1   200
2      1      1   400
3      1      1   120
4      2      2   300
5      2      2   100
6      2      2   100

But what I need is this

# A tibble: 8 x 5
  helloo   ooooHH   ahaaa cat_helloo cat_ooooHH
     <dbl>    <dbl> <dbl>    <int>    <int>
1        1        1   200        1        1
2        2        1   400        1        1
3        3        1   120        1        1
4        4        2   300        2        2
5        5        2   100        2        2
6        5        2   100        2        2
7        6        2   100        2        2
8        6        2   100        2        2

Is there a solution that does NOT require to store the intermediate data, and merge back to the original dataframe?

解决方案

Update 2020-06 for dplyr 1.0.0

Starting in dplyr 1.0.0, the across() function supersedes the "scoped variants" of functions such as mutate_at(). The code should look pretty familiar within across(), which is nested inside mutate().

Adding a name to the function(s) you give in the list adds the function name as a suffix.

dataframe %>%
     mutate( across(contains('oo'), 
                    .fns = list(cat = ~ntile(., 2))) )

# A tibble: 6 x 5
  helloo ooooHH ahaaa helloo_cat ooooHH_cat
   <dbl>  <dbl> <dbl>      <int>      <int>
1      1      1   200          1          1
2      2      1   400          1          1
3      3      1   120          1          1
4      4      2   300          2          2
5      5      2   100          2          2
6      6      2   100          2          2

Changing the new columns names is a little easier in 1.0.0 with the .names argument in across(). Here is an example of adding the function name as a prefix instead of a suffix. This uses glue syntax.

dataframe %>%
     mutate( across(contains('oo'), 
                    .fns = list(cat = ~ntile(., 2)),
                    .names = "{fn}_{col}" ) )

# A tibble: 6 x 5
  helloo ooooHH ahaaa cat_helloo cat_ooooHH
   <dbl>  <dbl> <dbl>      <int>      <int>
1      1      1   200          1          1
2      2      1   400          1          1
3      3      1   120          1          1
4      4      2   300          2          2
5      5      2   100          2          2
6      6      2   100          2          2

Original answer with mutate_at()

Edited to reflect changes in dplyr. As of dplyr 0.8.0, funs() is deprecated and list() with ~ should be used instead.

You can give names to the functions to the list you pass to .funs to make new variables with the names as suffixes attached.

dataframe %>% mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2)))

# A tibble: 6 x 5
  helloo ooooHH ahaaa helloo_cat ooooHH_cat
   <dbl>  <dbl> <dbl>      <int>      <int>
1      1      1   200          1          1
2      2      1   400          1          1
3      3      1   120          1          1
4      4      2   300          2          2
5      5      2   100          2          2
6      6      2   100          2          2

If you want it as a prefix instead, you could then use rename_at to change the names.

dataframe %>% 
     mutate_at(vars(contains('oo')), .funs = list(cat = ~ntile(., 2))) %>%
     rename_at( vars( contains( "_cat") ), list( ~paste("cat", gsub("_cat", "", .), sep = "_") ) )

# A tibble: 6 x 5
  helloo ooooHH ahaaa cat_helloo cat_ooooHH
   <dbl>  <dbl> <dbl>      <int>      <int>
1      1      1   200          1          1
2      2      1   400          1          1
3      3      1   120          1          1
4      4      2   300          2          2
5      5      2   100          2          2
6      6      2   100          2          2

Previous code with funs() from earlier versions of dplyr:

dataframe %>% 
     mutate_at(vars(contains('oo')), .funs = funs(cat = ntile(., 2))) %>%
     rename_at( vars( contains( "_cat") ), funs( paste("cat", gsub("_cat", "", .), sep = "_") ) )

这篇关于使用 mutate_at 创建新变量,同时保留原始变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆