dplyr 中的 mutate_each/summarise_each:如何选择某些列并为变异的列指定新名称? [英] mutate_each / summarise_each in dplyr: how do I select certain columns and give new names to mutated columns?

查看:30
本文介绍了dplyr 中的 mutate_each/summarise_each:如何选择某些列并为变异的列指定新名称?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 dplyr 动词 mutate_each.

有点困惑

使用基本的 mutate 将一列数据转换为 z 分数,并在 data.frame 中创建一个新列(此处使用名称 z_score_data):

newDF <- DF %>%选择(一列)%>%变异(z_score_data = one_column - (平均值(one_column)/sd(one_column))

然而,由于我有很多数据列想要转换,看来我应该使用 mutate_each 动词.

newDF <- DF %>%mutate_each(乐趣(规模))

到目前为止一切顺利.但到目前为止我还没有弄清楚:

  1. 如何为这些新列指定合适的名称,就像在 mutate 中一样?
  2. 如何选择我希望改变的某些列,就像我在第一种情况下使用 select 所做的那样?

感谢您的帮助.

解决方案

更新 dplyr >= 0.4.3.9000

在 dplyr 开发版本 0.4.3.9000(撰写本文时)中,mutate_eachsummarise_each 内部的命名已按照新闻:

<块引用>

summarise_each()mutate_each() 的命名行为已经进行了调整,以便您可以强制包含函数和变量名:summarise_each(mtcars, funs(mean = mean),everything())

如果您只想在 mutate_each/summarise_each 中应用 1 个函数,并且您想为这些列指定新名称,则这一点非常重要.

为了显示差异,这里是使用新命名功能的 dplyr 0.4.3.9000 的输出,与下面的选项 a.2 形成对比:

library(dplyr) # >= 0.4.3.9000虹膜%>% mutate_each(funs(mysum = sum(.)), -Species)%>% head()# Sepal.Length Sepal.Width Petal.Length Petal.Width 物种 Sepal.Length_mysum Sepal.Width_mysum#1 5.1 3.5 1.4 0.2 setosa 876.5 458.6#2 4.9 3.0 1.4 0.2 setosa 876.5 458.6#3 4.7 3.2 1.3 0.2 setosa 876.5 458.6#4 4.6 3.1 1.5 0.2 setosa 876.5 458.6#5 5.0 3.6 1.4 0.2 setosa 876.5 458.6#6 5.4 3.9 1.7 0.4 setosa 876.5 458.6# Petal.Length_mysum Petal.Width_mysum#1 563.7 179.9#2 563.7 179.9#3 563.7 179.9#4 563.7 179.9#5 563.7 179.9#6 563.7 179.9

如果您不提供新名称而只提供 1 个函数,dplyr 将更改现有列(就像在以前的版本中所做的那样):

iris %>% mutate_each(funs(sum), -Species) %>% head()# Sepal.Length Sepal.Width Petal.Length Petal.Width 物种#1 876.5 458.6 563.7 179.9 setosa#2 876.5 458.6 563.7 179.9 setosa#3 876.5 458.6 563.7 179.9 setosa#4 876.5 458.6 563.7 179.9 setosa#5 876.5 458.6 563.7 179.9 setosa#6 876.5 458.6 563.7 179.9 setosa

我认为这个新功能将在下一个 0.4.4 版本中通过 CRAN 提供.

<小时>

dplyr 版本 <= 0.4.3:

<块引用>

我怎样才能给这些新列适当的名称,就像我可以在变异?

a) mutate_each/summarise_each

中应用的 1 个函数

如果在 mutate_eachsummarise_each 中只应用 1 个函数,现有的列将被转换,名称将保持原样,除非你为 mutate_each_/summarise_each_ 提供一个命名向量(见选项 a.4)

以下是一些示例:

a.1 只有 1 个函数 -> 将保留现有名称

iris %>% mutate_each(funs(sum), -Species) %>% head()# Sepal.Length Sepal.Width Petal.Length Petal.Width 物种#1 876 459 564 180 setosa#2 876 459 564 180 setosa#3 876 459 564 180 setosa#4 876 459 564 180 setosa#5 876 459 564 180 setosa#6 876 459 564 180 setosa

a.2 同样,如果您指定新的列扩展名:

iris %>% mutate_each(funs(mysum = sum(.)), -Species) %>% head()# Sepal.Length Sepal.Width Petal.Length Petal.Width 物种#1 876 459 564 180 setosa#2 876 459 564 180 setosa#3 876 459 564 180 setosa#4 876 459 564 180 setosa#5 876 459 564 180 setosa#6 876 459 564 180 setosa

a.3 手动为每列指定一个新名称(但仅适用于少数列):

iris %>% mutate_each(funs(sum), SLsum = Sepal.Length,SWsum = Sepal.Width, -Species) %>% head()# Sepal.Length Sepal.Width Petal.Length Petal.Width 物种 SLsum SWsum#1 5.1 3.5 1.4 0.2 setosa 876 459#2 4.9 3.0 1.4 0.2 setosa 876 459#3 4.7 3.2 1.3 0.2 setosa 876 459#4 4.6 3.1 1.5 0.2 setosa 876 459#5 5.0 3.6 1.4 0.2 setosa 876 459#6 5.4 3.9 1.7 0.4 setosa 876 459

a.4 使用命名向量创建具有新名称的附加列:

情况 1:保留原始列

与选项 a.1、a.2 和 a.3 相比,dplyr 将保持现有列不变并在此方法中创建新列.新列的名称等于您预先创建的命名向量的名称(在本例中为 vars).

vars <- names(iris)[1:2] # 选择应该改变哪些列vars <- setNames(vars, paste0(vars, "_sum")) # 创建新的列名虹膜 %>% mutate_each_(funs(sum), vars) %>% head# Sepal.Length Sepal.Width Petal.Length Petal.Width 物种 Sepal.Length_sum Sepal.Width_sum#1 5.1 3.5 1.4 0.2 setosa 876.5 458.6#2 4.9 3.0 1.4 0.2 setosa 876.5 458.6#3 4.7 3.2 1.3 0.2 setosa 876.5 458.6#4 4.6 3.1 1.5 0.2 setosa 876.5 458.6#5 5.0 3.6 1.4 0.2 setosa 876.5 458.6#6 5.4 3.9 1.7 0.4 setosa 876.5 458.6

情况 2:删除原始列

如您所见,此方法保持现有列不变并添加具有指定名称的新列.如果您不想保留原始列,而只想保留新创建的列(和其他列),您可以在之后添加 select 语句:

iris %>% mutate_each_(funs(sum), vars) %>% select(-one_of(vars)) %>% head# Petal.Length Petal.Width 物种 Sepal.Length_sum Sepal.Width_sum#1 1.4 0.2 setosa 876.5 458.6#2 1.4 0.2 setosa 876.5 458.6#3 1.3 0.2 setosa 876.5 458.6#4 1.5 0.2 setosa 876.5 458.6#5 1.4 0.2 setosa 876.5 458.6#6 1.7 0.4 setosa 876.5 458.6

b) mutate_each/summarise_each

中应用了 1 个以上的函数

b.1 让 dplyr 找出新名称

如果你应用了 1 个以上的函数,你可以让 dplyr 自己找出名称(它会保留现有的列):

iris %>% mutate_each(funs(sum, mean), -Species) %>% head()# Sepal.Length Sepal.Width Petal.Length Petal.Width 物种 Sepal.Length_sum Sepal.Width_sum Petal.Length_sum#1 5.1 3.5 1.4 0.2 setosa 876 459 564#2 4.9 3.0 1.4 0.2 setosa 876 459 564#3 4.7 3.2 1.3 0.2 setosa 876 459 564#4 4.6 3.1 1.5 0.2 setosa 876 459 564#5 5.0 3.6 1.4 0.2 setosa 876 459 564#6 5.4 3.9 1.7 0.4 setosa 876 459 564# Petal.Width_sum Sepal.Length_mean Sepal.Width_mean Petal.Length_mean Petal.Width_mean#1 180 5.84 3.06 3.76 1.2#2 180 5.84 3.06 3.76 1.2#3 180 5.84 3.06 3.76 1.2#4 180 5.84 3.06 3.76 1.2#5 180 5.84 3.06 3.76 1.2#6 180 5.84 3.06 3.76 1.2

b.2 手动指定新列名

另一种选择,当使用多个函数时,是自己指定列名扩展:

iris %>% mutate_each(funs(MySum = sum(.), MyMean = mean(.)), -Species) %>% head()# Sepal.Length Sepal.Width Petal.Length Petal.Width 物种 Sepal.Length_MySum Sepal.Width_MySum Petal.Length_MySum#1 5.1 3.5 1.4 0.2 setosa 876 459 564#2 4.9 3.0 1.4 0.2 setosa 876 459 564#3 4.7 3.2 1.3 0.2 setosa 876 459 564#4 4.6 3.1 1.5 0.2 setosa 876 459 564#5 5.0 3.6 1.4 0.2 setosa 876 459 564#6 5.4 3.9 1.7 0.4 setosa 876 459 564# Petal.Width_MySum Sepal.Length_MyMean Sepal.Width_MyMean Petal.Length_MyMean Petal.Width_MyMean#1 180 5.84 3.06 3.76 1.2#2 180 5.84 3.06 3.76 1.2#3 180 5.84 3.06 3.76 1.2#4 180 5.84 3.06 3.76 1.2#5 180 5.84 3.06 3.76 1.2#6 180 5.84 3.06 3.76 1.2

<块引用>

我怎样才能像我一样选择我希望改变的某些列在第一种情况下选择?

你可以通过像这里这样(mutate Sepal.Length,但不是 Species)来引用要变异(或遗漏)的列来实现这一点:

iris %>% mutate_each(funs(sum), Sepal.Length, -Species) %>% head()

此外,您可以使用特殊函数来选择要变异的列,所有以某个单词开头或包含某个单词的列等,例如:

iris %>% mutate_each(funs(sum), contains("Sepal"), -Species) %>% head()

有关这些函数的更多信息,请参阅?mutate_each?select.

评论后

如果您想使用标准评估,dplyr 提供大多数以附加_"结尾的函数的 SE 版本.因此,在这种情况下,您将使用:

x <- c("Sepal.Width", "Sepal.Length") # 列名向量虹膜%>% mutate_each_(funs(sum), x)%>% head()

注意我在这里使用的 mutate_each_.

<小时>

编辑 2:更新了选项 a.4

I'm a bit confused about the dplyr verb mutate_each.

It's pretty straightforward to use the basic mutate to transform a column of data into, say, z-scores, and create a new column in your data.frame (here with the name z_score_data):

newDF <- DF %>%
  select(one_column) %>%
  mutate(z_score_data = one_column - (mean(one_column) / sd(one_column))

However, since I have many columns of data I'd like to transform, it appears I should probably use the mutate_each verb.

newDF <- DF %>%
     mutate_each(funs(scale))

So far so good. But as of yet I haven't been able to figure out:

  1. How can I give these new columns appropriate names, like I can in mutate?
  2. How can I select certain columns that I wish to mutate, like I did with select in the first case?

Thanks for your help.

解决方案

Update for dplyr >= 0.4.3.9000

In the dplyr development version 0.4.3.9000 (at time of writing), naming inside mutate_each and summarise_each has been simplified as noted in the News:

The naming behaviour of summarise_each() and mutate_each() has been tweaked so that you can force inclusion of both the function and the variable name: summarise_each(mtcars, funs(mean = mean), everything())

This is mainly important if you want to apply only 1 function inside mutate_each / summarise_each and you want to give those column new names.

To show the difference, here's the output from dplyr 0.4.3.9000 using the new naming functionality, in contrast to option a.2 below:

library(dplyr) # >= 0.4.3.9000
iris %>% mutate_each(funs(mysum = sum(.)), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_mysum Sepal.Width_mysum
#1          5.1         3.5          1.4         0.2  setosa              876.5             458.6
#2          4.9         3.0          1.4         0.2  setosa              876.5             458.6
#3          4.7         3.2          1.3         0.2  setosa              876.5             458.6
#4          4.6         3.1          1.5         0.2  setosa              876.5             458.6
#5          5.0         3.6          1.4         0.2  setosa              876.5             458.6
#6          5.4         3.9          1.7         0.4  setosa              876.5             458.6
#  Petal.Length_mysum Petal.Width_mysum
#1              563.7             179.9
#2              563.7             179.9
#3              563.7             179.9
#4              563.7             179.9
#5              563.7             179.9
#6              563.7             179.9

If you don't supply new names and you only supply 1 function, dplyr will change the existing columns (as it did in previous versions):

iris %>% mutate_each(funs(sum), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1        876.5       458.6        563.7       179.9  setosa
#2        876.5       458.6        563.7       179.9  setosa
#3        876.5       458.6        563.7       179.9  setosa
#4        876.5       458.6        563.7       179.9  setosa
#5        876.5       458.6        563.7       179.9  setosa
#6        876.5       458.6        563.7       179.9  setosa

I assume that this new functionality will be available via CRAN in the next release version 0.4.4.


dplyr verions <= 0.4.3:

How can I give these new columns appropriate names, like I can in mutate?

a) 1 function applied in mutate_each/summarise_each

If you apply only 1 function inside the mutate_each or summarise_each, the existing columns will be transformed and the names will be kept as they used to be, unless you supply a named vector to mutate_each_/summarise_each_ (see option a.4)

Here are some examples:

a.1 only 1 function -> will keep the existing names

iris %>% mutate_each(funs(sum), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          876         459          564         180  setosa
#2          876         459          564         180  setosa
#3          876         459          564         180  setosa
#4          876         459          564         180  setosa
#5          876         459          564         180  setosa
#6          876         459          564         180  setosa

a.2 also if you specify a new column name extension:

iris %>% mutate_each(funs(mysum = sum(.)), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          876         459          564         180  setosa
#2          876         459          564         180  setosa
#3          876         459          564         180  setosa
#4          876         459          564         180  setosa
#5          876         459          564         180  setosa
#6          876         459          564         180  setosa

a.3 Manually specify a new name per column (but only practical for few columns):

iris %>% mutate_each(funs(sum), SLsum = Sepal.Length,SWsum = Sepal.Width,  -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species SLsum SWsum
#1          5.1         3.5          1.4         0.2  setosa   876   459
#2          4.9         3.0          1.4         0.2  setosa   876   459
#3          4.7         3.2          1.3         0.2  setosa   876   459
#4          4.6         3.1          1.5         0.2  setosa   876   459
#5          5.0         3.6          1.4         0.2  setosa   876   459
#6          5.4         3.9          1.7         0.4  setosa   876   459

a.4 Use a named vector to create additional columns with new names:

case 1: keep original columns

In contrast to options a.1, a.2 and a.3, dplyr will keep the existing columns unchanged and create new columns in this approach. The names of the new columns equal the names of the named vector you create in advance (vars in this case).

vars <- names(iris)[1:2]  # choose which columns should be mutated
vars <- setNames(vars, paste0(vars, "_sum")) # create new column names
iris %>% mutate_each_(funs(sum), vars) %>% head 
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum
#1          5.1         3.5          1.4         0.2  setosa            876.5           458.6
#2          4.9         3.0          1.4         0.2  setosa            876.5           458.6
#3          4.7         3.2          1.3         0.2  setosa            876.5           458.6
#4          4.6         3.1          1.5         0.2  setosa            876.5           458.6
#5          5.0         3.6          1.4         0.2  setosa            876.5           458.6
#6          5.4         3.9          1.7         0.4  setosa            876.5           458.6

case 2: remove original columns

As you can see, this approach keeps the existing columns unchanged and adds new columns with specified names. In case you don't want to keep the original columns, but just the newly created columns (and the other columns) you can just add a select statement afterwards:

iris %>% mutate_each_(funs(sum), vars) %>% select(-one_of(vars)) %>% head
#  Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum
#1          1.4         0.2  setosa            876.5           458.6
#2          1.4         0.2  setosa            876.5           458.6
#3          1.3         0.2  setosa            876.5           458.6
#4          1.5         0.2  setosa            876.5           458.6
#5          1.4         0.2  setosa            876.5           458.6
#6          1.7         0.4  setosa            876.5           458.6

b) more than 1 function applied in mutate_each/summarise_each

b.1 Let dplyr figure out new names

If you applied more than 1 function, you can let dplyr figure out names by itself (and it will keep the existing columns):

iris %>% mutate_each(funs(sum, mean), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_sum Sepal.Width_sum Petal.Length_sum
#1          5.1         3.5          1.4         0.2  setosa              876             459              564
#2          4.9         3.0          1.4         0.2  setosa              876             459              564
#3          4.7         3.2          1.3         0.2  setosa              876             459              564
#4          4.6         3.1          1.5         0.2  setosa              876             459              564
#5          5.0         3.6          1.4         0.2  setosa              876             459              564
#6          5.4         3.9          1.7         0.4  setosa              876             459              564
#  Petal.Width_sum Sepal.Length_mean Sepal.Width_mean Petal.Length_mean Petal.Width_mean
#1             180              5.84             3.06              3.76              1.2
#2             180              5.84             3.06              3.76              1.2
#3             180              5.84             3.06              3.76              1.2
#4             180              5.84             3.06              3.76              1.2
#5             180              5.84             3.06              3.76              1.2
#6             180              5.84             3.06              3.76              1.2

b.2 Manually specify new column names

Another option, when using more than 1 function, is to specify the column name extension on your own:

iris %>% mutate_each(funs(MySum = sum(.), MyMean = mean(.)), -Species) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_MySum Sepal.Width_MySum Petal.Length_MySum
#1          5.1         3.5          1.4         0.2  setosa                876               459                564
#2          4.9         3.0          1.4         0.2  setosa                876               459                564
#3          4.7         3.2          1.3         0.2  setosa                876               459                564
#4          4.6         3.1          1.5         0.2  setosa                876               459                564
#5          5.0         3.6          1.4         0.2  setosa                876               459                564
#6          5.4         3.9          1.7         0.4  setosa                876               459                564
#  Petal.Width_MySum Sepal.Length_MyMean Sepal.Width_MyMean Petal.Length_MyMean Petal.Width_MyMean
#1               180                5.84               3.06                3.76                1.2
#2               180                5.84               3.06                3.76                1.2
#3               180                5.84               3.06                3.76                1.2
#4               180                5.84               3.06                3.76                1.2
#5               180                5.84               3.06                3.76                1.2
#6               180                5.84               3.06                3.76                1.2

How can I select certain columns that I wish to mutate, like I did with select in the first case?

You can do that by referencing the columns to be mutated (or left out) by giving their names like here (mutate Sepal.Length, but not Species):

iris %>% mutate_each(funs(sum), Sepal.Length, -Species) %>% head()

In addition, you can use special functions to select columns to be mutated, all columns that start with or contain a certain word etc by using for example:

iris %>% mutate_each(funs(sum), contains("Sepal"),  -Species) %>% head()

For more information of those functions, see ?mutate_each and ?select.

Edit 1 after comment:

If you want to use standard evaluation, dplyr supplies SE-versions of most functions ending with an addtional "_". So in this case you would use:

x <- c("Sepal.Width", "Sepal.Length") # vector of column names 
iris %>% mutate_each_(funs(sum), x) %>% head()

Notice the mutate_each_ I used here.


Edit 2: updated with option a.4

这篇关于dplyr 中的 mutate_each/summarise_each:如何选择某些列并为变异的列指定新名称?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆